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Notations 


The following notations are defined at their first occurrence within the text. They 
will be reused without reminded afterward. Very generally, random variables will 
be denoted using capital letters, while their realizations will be denoted with usual 
ones. Vectors and matrices will be written in bold type, unlike scalars. 
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realization of X 

Cumulative distribution function of X 

Probability density function of X 
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X 

Space of parameter vector 0 
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Prior cumulative distribution function 

Posterior cumulative distribution function 
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Variance of X 
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Chapter 1 A) 
Extreme Events and History: For a ge 
Better Consideration of Natural Hazards 


Emmanuel Garnier 


Abstract This preface aims to remind us of the fundamental importance, for our 
modern but fragile societies, of the difficult study of historical data on natural hazards. 
Forgetting is a regular feature of our decisions, and the preservation of the memory of 
natural disasters depends heavily on our ability to quantify their effects. The author, 
a climate historian, recalls here some fundamental aspects of this discipline and its 
necessary connection to statistical approaches to risk analysis. 


Ata first glance it may seem surprising, if not debatable, to ask a historian to introduce 
a work entitled Extreme Value Theory with Applications to Natural Hazards writ- 
ten by seasoned and experienced engineers. However, the book’s contributors have 
indeed judged it sound to bestow me with the great honor of writing the introduction 
to this educational collective work. 

De facto, this book is based on two major observations. First, the obvious scarcity 
of textbooks on extreme value statistics, and directly related to this, the difficulties 
encountered by students and engineers in improving their expertise in this strategi- 
cally important domain. The second observation—deo gratias!—legitimizing to a 
certain extent the presence of a historian, is an acknowledgement of the fragility of 
statistical methods when forced to confront unpredictable events, like for instance a 
beaver dam in Quebec which made a mockery of patiently developed hydrological 
models and led to a disastrous flood. 

However, by diving deeply into the archival veins of ancient societies, the historian 
injects a plethora of “unpredictable events" that can potentially disrupt models, a 
reality that explains the sustained interest in the world of reinsurance for this type of 
information [316]. 

Is more proof needed to convince you? In 2008, as I was conducting research in 
a renowned French climatology laboratory, I shared with my colleagues and other 


E. Garnier (5<) 

Research Professor CNRS, 

University of Besangon, Besangon, France 
e-mail: egarnier.cea-cnrs @ orange. fr 


© Springer Nature Switzerland AG 2021 1 
N. Bousquet and P. Bernardara (eds.), Extreme Value Theory with Applications 
to Natural Hazards, https://doi.org/10.1007/978-3-030-74942-2_1 


2 E. Garnier 


disaster management specialists the historical risk of a volcanic eruption affecting 
the European continent. Deeply disconcerted, my interlocutors politely helped me 
understand that this was extremely unlikely and that such an idea probably came from 
an over-interpretation of historical sources which were, “by definition, unreliable”. 
I evoked the terrible precedent of the Icelandic volcano Laki in the summer of 1783, 
which killed tens of thousands of people, but they were still not convinced, merely 2 
years later the Eyjafj6ll volcano erupted, with well-known significant economic and 
social consequences [52, 309]. 


1.1 A Neglected Gold Mine: Historical Archives 


To study extreme events of the past, historians use direct and indirect data concur- 
rently. The former refers to archival sources, which describe, if one is interested in 
the climate, the weather conditions on a given date. Though this information comes 
mainly from instrumental measurements—like meteorological readings in personal 
diaries or by scientists—it also contains precise descriptions of extreme weather 
events. In contrast, the latter represents the influence of meteorology and geology on 
the natural components of the biosphere and the hydrosphere. More specifically, it 
provides indirect information on floods, droughts, storms, volcanoes, earthquakes, 
log-jams, and vegetation. 

Numerous and varied phenological sources allow us to study the influence of cli- 
mate variation on annual or periodic phenomena related to vegetation (germination, 
flowering, and dates at which fruit ripen) and the animal world (birdsong, migration, 
nest-making, etc.). Such records, existing from the thirteenth century on in France, 
haphazardly jumble things like official grain, grape, and other fruit (apples, olives, 
etc.) harvest dates. Although these are certainly not forgotten by paleoclimatologists, 
their use remains open to discussion. Their users are frequently criticized for not tak- 
ing into account a number of fundamental anthropic parameters such as cultural 
changes and geopolitical and epidemiological contexts. 

It is at levels of uncertainty like this that the historian—an archival specialist par 
excellence—intervenes to “separate” time series data from its social imprint. For 
instance, the contextualization of the grape harvest banns of the city of Besançon (cf. 
Figure 0.1) between 1525 and 1850 proved that military events and plague epidemics 
played a decisive role in the grape harvest dates [318]. Also, we should not forget 
that the use of phenology to reconstruct climates of the past is limited by seasonality: 
primacy is given to seasons in which plants grow best. 

An inexhaustible source of Clio’s disciples, administrative archives concurrently 
offer two major advantages: chronological continuity (generally from the fifteenth 
century) and homogeneity of information. Town archives contain first-hand sources 
on municipal debates and city events in which catastrophes are omnipresent. Any 
extreme event had an impact on infrastructure (bridges, mills, canals, etc.) and the 
fragile socio-economic balance of a town. It therefore provoked debate, as well as 
decision-making, on the part of elected representatives. Moreover, urban sources 
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solve the problem of seasonal representativeness. Indeed, unlike the countryside, 
the off-season does not exist in towns, and elected representatives discuss summer 
droughts just as much as winter ice-jams. 

Engaged in a centralization phase, the Princely States were endowed with new 
bureaucratic tools as early as the seventeenth century. The process is particularly evi- 
dent in France, where the monarchy created specialized archive-producing adminis- 
trations. The windthrow register (trees uprooted by wind), maintained since the 1560s 
by the forestry administration, as well as the reports written by the Admiralty of the 
Royal Navy along the coast, describing extreme maritime events (storms, submer- 
sions, tsunamis, etc.), are two important examples of such specialized archives. The 
first modern meteorological observations in Europe begin to be collected at this time, 
with the founding of the Observatoire de Paris (1669) and its sister institution, the 
Académie royale des sciences— and a century later the Société royale de médecine in 
France (1778), the Royal Society of London, and the Societas Meteorologica Palatina 
in Mannheim [308]. 

Additional sources such as personal diaries and family ledgers can also be used. 
These too mention catastrophic events that have occurred. More surprisingly per- 
haps, archives of the Catholic and Protestant churches around the world provide a 
very wide range of extraordinary documents. These include references to rain and 
other (wished-for) weather processions, depending on the climate of the time, which 
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Fig. 1.1 Comparison of droughts in France (right-hand side y-axis: quantified according to the 
Historical Severity Drought Scale (HSDS)) with the annual rainfall in Paris (left-hand side y-axis: 
in mm) in the period 1689-2013. Source Garnier [316] 
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provide information on severe events like droughts, floods, and extreme cold. Another 
very fruitful track is parish registers in which the priest or pastor recorded deaths. 
These make it possible to apprehend the demographic effect of a crisis or the impact 
of a storm on a church steeple. Purely Catholic, the ex-voto translates into visual 
form past catastrophes and the social perception of them. 

More than mere commemorative paintings, the point of these at a time of strong 
religious practice was to recall persisting dangers (maritime, volcanic, and climatic) 
to the community [613]. Such ecclesiastical sources can inform natural hazard 
researchers about often unexpected parts of the world. Would it surprise you that, 
for example, it is now quite possible (if the financial means are there) to reconstruct 
a history of natural disasters in China from the archives of French missionaries? 
In its highly bureaucratic way, the Catholic church generated thousands of reports 
between the 1600s and the mid-twentieth century. Well-informed in the sciences, 
priests carried out systematic meteorological surveys and recorded all the disasters 
(tsunamis, submersions, earthquakes, etc.) that they had witnessed (Fig. 1.1). 


12 Turning Words into Figures 


In a step-by-step way, the process consists of tracking in archives—as any histo- 
rian does—non-instrumentally based information appearing in the form of direct or 
indirect data, then “converting” it into a measure of climate variation (temperature, 
precipitation, and barometric pressure) during the selected period. This is where the 
most daring scientific challenge lies and extreme care is required, where it is neces- 
sary to transcribe this predominantly word-based information into actual temporal 
data series. 

This process can only be implemented starting with the creation of a database 
in which all information brought together is organized and placed in chronological 
order so as to offer an overall view of the results, and furthermore, facilitate statistical 
analyses. Nevertheless, it is important to note that to be useful, these data must be 
collected at relevant geographic scales (by region, for instance). Otherwise, biases 
related to climatic and topographic variability are likely to lead to aberrant time 
series. To date, around 100 000 text-based and instrumentally based data points have 
been collected. Far from being definitive, this database will grow due to progress and 
projects in the years to come. 

One item of data here corresponds to one historical reference but not necessarily 
one data point, as the same event can easily involve two or more phenomena. For 
instance, a violent storm may be accompanied by heavy precipitation. From left to 
right in a given row of the database presented in Table 1.2, we have the event, the 
precise source, the estimated reliability of the source, the place, the year, the exact 
date when indicated, and lastly the text itself, in order that any researcher can see in 
detail the original source when needing to retrieve accurate information or reinterpret 
the content. 
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Table 1.1 Severity categories and associated level of damage for historical floods before the arrival 
of measurement instruments 


5 Very severe event in terms of geographic magnitude (local, regional, national), 
financial and societal cost (shortages, riots, large numbers of deaths) 

4 Significant regional damage: hydraulic infrastructure, bridges, farms, cattle, harvests, 
communication lines 

3 Significant damage but limited to a few localities or a city: costs, shortages, drowned 
cattle, human deaths 

2 Light and local damage: a few villages, agricultural land, wetlands 

1 Event mentioned in sources, limited and localized damage 

-1 Mentioned in historical sources but nothing more 


After this preliminary step, it remains to “derive” values from these archives by 
transforming the basic data into meteorological severity indices (temperature and 
precipitation) or seismic measures (the EMS! scale), for instance, which make it 
possible to interweave time-marked text sources with instrument data [313]. To do 
this, the descriptive content of written archives can be used to establish an intensity 
scale for catastrophic events of the past, provided that historians, engineers, modelers, 
and/or insurers work well as a team. Taking the example of storm surge, it is easy 
to compare an event of the past with its modern-day counterpart by looking at the 
damage caused to infrastructure and society, such as the destruction of dunes, dykes, 
flooded areas, and even loss of life [317] (Table 1.1). In this way, the intensities 
indexation system can lead to drawing regular curves over extremely long periods 
of time, meaning that comparisons can now be considered as statistically relevant 
(Fig. 1.2). 


1.3 Some Extreme Hazards in the Light of History 


In practice, historical documents are riddled with “monstrosities” and “disturbances”, 
expressions used in the past to designate what we now call extreme hazards or natural 
disasters. And for good reason: calling an event catastrophic, both now and in the past, 
implies that humans were or have been impacted. On the climate front, our decision- 
makers, media, and even a part of the scientific community are constantly asserting 
that we will experience an upsurge and intensification of natural disasters in the future. 
Of all the uncertainties and interrogations raised by this debate, a major question 
emerges: can recent extreme events be studied with a view to comparing them to 
historical events, in order to more accurately assess social, economic, political, and 
cultural consequences? To the joy of historians, the extreme nature of these types of 
events has left a trace across historic sources, like coins falling one by one from a 


' European macroseismic scale, which measures the strength and severity of earthquakes. 
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Fig. 1.2 History of storm surge impact on the North Norfolk coast (the United Kingdom, cf. 
Fig. 0.2). The impact score (y-axis) is based on many types of damage, such as flooding of buildings, 
loss of life, crops, and livestock, flooded areas, damage to infrastructure, and financial cost. Source 
[317] 


pocket with a hole in it, enough even to reconstruct an extremely robust history of 
the worst natural events of the past five centuries. 


1.3.1 Continental Storms 


Based on archival analysis, the verdict of history refutes the comments of those 
who, in the aftermath of the huge European storm of December 1999, insisted on its 
exceptional wind strength. During France’s Ancien Régime, for example, it would 
have been more appropriate to speak of the “storm of the century”, as the elements 
tend to be unleashed with regularity due to climatic conjunctures at given times. 
The five hundred years (1500-2000) represented in Fig. 1.3 thus reveal a chronology 
of contradictory peaks and troughs in the number of severe events recorded. Over 
the period as a whole, 22 events between storm (force 10) and hurricane (force 12) 
strength affected northern Europe, with a return period of about 15 years, with 80% of 
them occurring in the months of January, February, and March. However, we should 
be careful not to draw any general conclusions from this, because the apparent return 
periods do not take into account geographical and temporal variability. 
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Fig. 1.3 Severe storms (force > 10) in Europe from 1500 to 2000. Source Garnier [316] 


Nevertheless, certain periods appear more critical than others; for instance, records 
of the forestry administration indicate windthrow years clustered around 1580-1640, 
1710-1770, and 1870-1920. If we consider only the most severe events, France was 
touched by 15 storms, whose force was between 10 and 12, during this period. The 
early modern period (sixteenth and seventeenth centuries) is a poor sibling—though 
gaps in the archives are probably numerous—compared to the last century of the 
Ancien Régime (before 1789), where eight disasters were recorded, five in the short 
period 1720-1760. This increase in wind speed was not entirely a matter of chance, 
however. It suffices to cross-reference it with the global climate context of the time 
to see that it developed during a relative warming period accompanied by strong 
low-pressure activity. 

Subsequently, the meteorological situation continued to improve despite a few 
returning cold snaps, induced by the last counter-offensives of the Little Ice Age in 
the 1780s. However, the end of the eighteenth century was not exempt from severe 
storms, as evidenced by the events of January 17, 1784, which devastated the Atlantic 
coast. The national repercussions were such that the prestigious Société royale de 
médecine carried out an investigation in the days following the event. It resulted in 
the drafting of a comprehensive report on meteorological conditions and damage 
suffered at the time. It must be said that on this day, Aeolus had a particularly heavy 
hand, causing the sinking of some 30 ships and the death of around 50 people [310]. 

In the following century, in a Europe engaged in the industrial revolution, France 
had an impressive sequence of violent storms and hurricanes that ravaged both the 
north and the south in the years 1842, 1869, 1872, 1876, and 1894. Nevertheless, 
major natural catastrophes were in fact more numerous in the eighteenth century (9 
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events) than in the twentieth (7 events), even when taking into account the violent 
Lothar and Martin windstorms of 1999. 


1.3.2 Droughts 


Drought has always marked rural and urban societies and is regularly mentioned in 
historical sources. For the record, note that the general term “drought” covers several 
different things, though in its most frequent usage, it describes a lack of rainfall. 

In this sense of the word, and being unable to work with variables giving the devi- 
ation from mean or normal rainfall—non-existent before the middle of the eighteenth 
century—the usual approach consists in relying on quantitative mentions found in 
sources, like duration, frequency, and geographic extent. This is acceptable given that 
today, both for the World Meteorological Organization and Météo-France, drought 
severity is characterized by the number of days without rain. Archives have, however, 
the immense advantage of not normally introducing ambiguity, typically specifying 
both the beginning and end dates (most often religious) of dry periods, as well as 
numerous mentions like “great hurry to mill because of the low water” and indica- 
tions that “fish are dying in the rivers because of the drought’. It would be wrong 
though to exaggerate by suggesting that droughts mentioned in the archives corre- 
spond to a total absence of precipitation. Such droughts—uncovered by historians— 
simply refer to long and extremely dry episodes that severely affected societies and 
economies [312]. 

In the case of France, 68 droughts were recorded between 1500 and 2014. These 
are quite unevenly distributed, chronologically speaking, and correspond to a wide 
range of levels of severity (Fig. 1.4). 

The disparities are even more striking when these events are looked at over 50- 
year periods. Three main trends can be seen since the sixteenth century. The first 
is between 1500 and 1700, characterized by a sustained number of droughts with 
on average one every 8 years. A big change occurs in the eighteenth century, which 
sees a dramatic increase in the number of droughts, 21 in all, with an impressive 
peak visible in the first half of the century (14 droughts). During this time, a drought 
occurred on average once every 3.5 years, which is not even close to being matched 
over the rest of the 500 years studied. Never before 1700 or after 1800 was this kind 
of frequency observed. This string of dry episodes is particularly well documented 
in archives and in the proceedings of the Académie royale des sciences in Paris. 
Indeed, the scientists of the Observatoire de Paris were particularly concerned about 
this long dry and hot period. For this reason, a large number of instrument data and 
observations are readily available for today’s historians [308]. 

The nineteenth and twentieth centuries are characterized by more frequent 
droughts, even though the Little Ice Age faded away from the 1850s on. Never- 
theless, the twentieth century has witnessed a real upsurge, though without it being 
possible to observe a turning point from 1950 onwards. Remember, however, that this 
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Fig. 1.4 Chronology and severity of droughts in France from 1500 to 2014 according to the His- 
torical Severity Drought Scale. Source Garnier [316] 


increase is notable only in the context of a 200-year comparison. The level remains 
much below the records of the eighteenth century. 

The available British records allow us to identify 42 droughts of varying severity 
on the Historical Severity Drought Scale (HSDS). The chronology thus reconstructed 
(Fig. 1.5) shows very large differences from one century to the next, both in terms of 
frequency and severity. 

First, the distribution of extreme events by half-century shows the great arid 
phases of Great Britain’s history since the 1500s. There is a clear break between the 
period before 1700 and all that comes after. In the former, only 11 events occurred, 
whereas there were 31 in the latter. This low number before 1700 has two possible 
explanations. The first is simply gaps in the archives, while the second—probably the 
more relevant—is linked to the climate context of the period. Indeed, the sixteenth 
and seventeenth centuries corresponded in Europe to a very severe phase of the Little 
Ice Age, with many more humid and cool seasons, especially between 1540-1640 
and 1683-1693. 
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Fig. 1.5 Chronology and severity of droughts in Great Britain from 1500 to 2014 according to the 
Historical Severity Drought Scale. Source Garnier [316] 


After the 1700s, droughts appear to be much more frequent. The first period 
corresponds to the years 1700-1750 during which the European continent underwent 
a new set of climatic conditions. From 1705 until the end of the 1730s, the climate 
was warmer and dryer, although 1709 still remains today the year of the “great 
frost” in Europe. Moreover, several droughts like those of 1714, 1715, and 1740 
began during very cold and dry winters. The second break occurs after 1800 with 
an increased frequency of droughts, most probably attributable to the beginning of 
global warming during 1830-1850. Henceforth, every 50-year period suffered six to 
seven drought events. We note in passing that there has been no increase in this since 
the 1950s. 

Lastly, monthly precipitation data from Oxford, collected since the eighteenth 
century, can be brought together with the HSDS drought assessment (Fig. 1.6). There 
are good matches between the historical events recorded in the archives and periods 
of rainfall deficit. Nevertheless, droughts of severity 5 do not consistently correspond 
to the lowest precipitation. For instance, the extremely severe droughts of 1785 and 
1976 do not coincide with the lowest precipitation levels in Oxford. Reasons for this 
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Fig. 1.6 Plot of British droughts (measured on the HSDS scale) superimposed on precipitation in 
Oxford (right-hand side y-axis: in mm) between 1767 and 2012. Source Garnier [316] 


may include local rainfall conditions which may be different from the rest of the 
country, and other meteorological factors—like wind—that may have exacerbated 
droughts [314]. 


1.3.3 Storm Surges 


On February 28, 2010, at around 2 am, Cyclone Xynthia struck France’s Atlantic 
coast, mainly between the south of Brittany and Bordeaux, causing human and mate- 
rial losses concentrated in Vendée and Charente-Maritime. Dykes, dunes, and other 
structures were overcome, resulting in the flooding of more than 50,000 hectares and 
around 50 deaths. Among the reasons for the high human and material cost, the fail- 
ure to take historical experience into account—in particular, the North Sea disaster 
of 1953—played a major role, thus increasing the exposure of coastal societies to 
such events (Table 1.2). 
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Fig. 1.7 Distribution of storm surges by century near Gravelines (North Sea, cf. Fig.0.1). Source 
Garnier (2018) 


A historical approach refutes the idea that a catastrophe of the February 2010 
magnitude was totally unexpected. Indeed, between 1550 and 2010, 117 such events 
are recorded in the archives for the French coastline. Among them, 30 hit the Atlantic 
coast exclusively. More interesting are the average return times calculated from 
these time series. Once again, these are relatively homogeneous—between 14 and 
19 years—with an estimated period of 15 years on the Atlantic coast [317, 375]. 

Focusing on the Atlantic coast alone, the chronology shows wide disparities across 
500 years. Contrary to what one might think, the last 50 years have not seen an upsurge 
in this kind of extreme event. The most catastrophic century is the eighteenth, with 
nine storm surge events, whereas the twentieth century totaled only five (see, for 
example, Fig. 1.7). It also seems that the apogee of the little ice age in the seventeenth 
century resulted in a lower frequency of such extreme events (three only). During the 
last 100 years, six storm surge events struck France’s Atlantic regions, but notably, 
the great majority occurred between 1924 and 1957. 

Receiving the full force of Xynthia, the town of La Faute-sur-Mer did not discover 
the dangers of the sea only in 2010. At least seven submersions have affected the 
town since 1711, an average return time of 42 years (Fig. 1.8). More astonishing is 
the fact that the municipality experienced three storm surges in the first half of the 
twentieth century (1928, 1937, and 1941), rather recent events, which once again 
seem to have been totally ignored when developing the tourist potential of the town 
[315]. 
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Fig. 1.8 Storm surge events at La Faute-sur-Mer between 1700 and 2010 in terms of impact score. 
This score is based on many types of damage, such as flooding of buildings, loss of life, crops, and 
livestock, flooded areas, damage to infrastructure, and financial cost. Source [317] 


1.4 Lessons of History 


The paradigm of civilization as presented by the philosopher Montesquieu in the 
eighteenth century postulates that the more a society evolves, the more it can protect 
itself from the consequences of natural disasters. The archives examined here tend 
to prove the contrary, underlining the capacity of pre-industrial societies to preserve 
and transmit the memory of natural catastrophes, and beyond that, draw lessons from 
experience to improve resilience, using visual means like ex-votos, more resilient 
landscape features such as hedgerows, and simply by not constructing directly on 
the coastline, among others. More trivially, a simple question needs to be asked: can 
historical examples be used to strengthen the capacity of contemporary society to 
protect itself against extreme events? 

Though the experience of the past should not be idealized, it must be taken into 
account and used to design prevention strategies based on adaptation ideas devel- 
oped by our predecessors. For them, the risk was not a fatality, but rather a state of 
expectation leading to anticipation of possible crises. They therefore kept alive the 
memory of past catastrophes and tried to be ready for ongoing threats. Indeed, com- 
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munities, from medieval times to the first half of the twentieth century, developed 
flexible and effective alert/prevention systems, enabling people to take refuge, either 
upstairs in their houses or in safe areas, known from the past to be protected from 
flooding. Unfortunately, such teaching has been lost due to a break in the continuity 
of passed-on memories, demographics (rural exodus, the attraction of the seaside), 
and loss of technical know-how in the post-World War II context. While empirical, 
these ways of the elders illustrate the passing on of memories, based on the ability 
of a group to acquire and develop, over time, knowledge of its own survival legacy. 
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Chapter 2 A) 
Introduction gts 


Nicolas Bousquet and Pietro Bernardara 


Abstract This introduction recalls the considerable socio-economic challenges 
associated with extreme natural hazards. The possibilities of statistical quantifica- 
tion of past hazards and extrapolation, offered by extreme value theory, make it an 
essential tool to improve risk mitigation. The objectives of this book, answering the 
questions of engineers, decision-makers, researchers, professors, and students, are 
briefly presented. 


2.1 Major Economic and Societal Challenges 


Understanding natural hazards and the industrial risks they lead to is a topic of great 
importance to society. With all its human, economic, and ecological dimensions, 
society has repeatedly found itself vulnerable to “extreme” natural events. Floods, 
storms, heavy rains, and the like, have the ability to significantly weaken or even 
destroy civil engineering works and industrial sites. High or extremely low tem- 
peratures have the potential to induce heavy rain [59] or drastically reduce water 
resources at production sites where machines need to be cooled [482, 764]. Indus- 
trial accidents provoked by extreme natural events can in turn disrupt the habitats and 
ecosystems of a whole region. As an instance, strong concerns have recently been 
expressed about the robustness of aging American dams to extreme precipitation, the 
destruction of which would have colossal socio-economic and environmental costs 
[478]. The potentially cumulative effect of more than one extreme event is especially 
feared, as it may lead to even worse consequences. For instance, a storm might cut 
off access to a site that needs to be protected from a second event linked to the first, 
such as flooding caused by heavy rain. 

The quantification of extreme natural hazards, in terms of probability of occur- 
rence and associated costs, became a topic of major interest in the mid-twentieth cen- 
tury. The catastrophic flood of February 1953, simultaneously in the Netherlands, 
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England, and Belgium (over 1 800 dead [451]) alerted the scientific and political 
worlds to the importance of being able to estimate the risk associated with observed 
extreme natural events and even to be able to extrapolate risk assessment to different 
or more extreme scenarios. From the last years of the twentieth century, extensive 
media coverage of such events, like the mudslides in Venezuela in 1999 (over 30 000 
deaths [175]) and the storms Lothar and Martin which hit Europe in December 1999 
(and partially flooded the Blayais nuclear power plant in Gironde), has again brought 
to our attention the increased vulnerability of our complex societies to environmental 
risks [111, 477]. Only 20 days after the storm Xynthia, Iceland’s Eyafjallajôkull vol- 
cano erupted, significantly changing the perception of the risk of volcanic eruption 
in Europe, at least for the aviation industry. After the Earth Summit in Rio in 1992, 
at last, this subject was finally given a more important place at the table, when it was 
realized that consequences of climate change could include an increase—in severity 
and frequency—of certain types of extreme natural events [111, 520, 765, 788]. 
Though more recent works are more conservative on the subject and suggest that 
the influence of climate change is yet to be proved for cyclones, hail, and torrential 
rain (among others), more frequent and unprecedented heatwaves [554], faster winds 
[799], and rising sea levels are already upon us [18].! Recent major civil and indus- 
trial emergencies caused by extreme natural events—-sometimes cumulative—such 
as the Fukushima nuclear accident in March 2011 involving an earthquake followed 
by a tsunami—only strengthen the need for methods for quantifying extreme events, 
that can effectively take into account their inherent complexity. 

Generally, risk mitigation, that is, the set of measures taken ex ante to mitigate 
the effect of cumulative extreme events, is of great interest to authorities [9, 96]. 
Large industries—including energy companies such as EDF—more than share these 
concerns. Indeed, it is essential for them—a legal obligation even—to ensure the 
robustness of their facilities against natural hazards, preserve production, and mini- 
mize societal consequences due to a partial or total loss of them. Such mitigation is 
generally carried out by designing site protection projects and by setting up warning 
systems and crisis management processes. These call for studies consisting of: 


(i) understanding and characterizing physical phenomena that generate environ- 
mental phenomena; 
(ii) characterizing the intensity of possible extreme events. 


It is clear however that, depending on the issues and the vulnerability of the sites to 
be protected, we may wish to completely protect them from these extreme events, or 
instead accept a certain level of risk (with zero risk equivalent to infinite cost). The 
level of risk we decide to accept is a choice which society, regulatory agencies, and 
project leaders can then use to optimize protective structure designs and mobilize 
resources. This interpretive dimension to the risk, which may involve important 
philosophical and societal concerns [70, 379, 541], is common to many domains 


' Extreme events happen anyway, whether influenced by climate change or not. Moreover, uncer- 
tainty about the effects of climate change is even higher for extreme events than for changes for 
regular, average events. 
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that make use of formal methods for quantifying risk. It is particularly true for 
actuarial studies [124, 644, 666] and financial studies [249, 733], where extreme 
value statistics are frequently used. We point readers with a strong interest in this 
subject toward to the works [108, 109] for more details. 

In its most standard accepted form, particularly in industry, an industrial risk*? 
is generally defined as the product of the probability of occurrence of a hazard and 
a measure of the cost it would induce [6]. Quantifying the vulnerability of a site 
involves various types of cost (human, goods, etc.), while the probability measure is 
a generic diagnostic tool that helps us compare diverse scenarios. 

Rather than probabilities, in practice we often speak of the frequency of occurrence 
(a probability typically representing the meaning of a limiting frequency [551]). For 
other reasons, given in Sect. 3.1, it is often appropriate to define a phenomenon? as 
truly random and try to characterize it using probability distributions and processes, 
themselves quantified using statistical tools. Indeed, it is quite useful to be able 
to associate a frequency of occurrence (for example, annual) to the intensity of a 
natural phenomenon. Obviously, the more extreme the phenomenon, the rarer it will 
be, and vice versa. A more practical way to express the rarity of an event is to 
use the concept of return period. This, and the frequency of occurrence, have been 
borrowed from statisticians by engineers and modified to serve their needs. Good 
practice, methodology manuals, and regulatory frameworks have therefore integrated 
these concepts. 

It logically follows that there is a need for statistical estimation methods and 
tools for analyzing the frequency of occurrence of rare events. This link between 
the statistical characterization of natural phenomena and the design of civil and 
industrial engineering works is long-established [260]. New perspectives are opening 
up today: the contribution of new contextual data, sometimes in large numbers (Big 
Data paradigm), the complexity of situations where risks accumulate, and taking 
into account trends, some of which being linked to climate change, mean constantly 
improving these methods and tools and updating the regulatory frameworks (Fig. 2.1). 


2.2 A Statistical Theory of Extreme Values 


Historically the use of probability and statistics for the characterization of extreme 
natural events took off in the first half of the twentieth century, thanks to the theoretical 
work of Fréchet [291], Fisher and Tippett [283] and Gumbel [364]. They showed 
that, under certain assumptions, an extreme data sample (non-stationary over a time 
period or values exceeding a threshold) can be seen as a set of samples from a group of 
probability distributions called extreme value distributions, which can be statistically 
estimated from an observed sample. 


? The asterisk indicates a term found in the glossary at its first occurrence in the text. 
3 Or at least the nature of the mechanism producing the actual effects of the phenomenon. 
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Fig. 2.1 Flooding of the Seine in Paris, June 2, 2016 


This theory had its first real practical use following the events of 1953 mentioned 
above. Since then, the application of statistical distributions from extreme value 
theory to samples of observations of environmental variables collected at sites of 
interest has become common methodological practice. An important ingredient in 
the decision support process, it allows a scientific characterization of random natural 
hazards. Accepted by authorities, scientists and engineers worldwide, it is now the 
go-to methodology in certain regulatory frameworks [1]. 

Today, thanks to the availability of large regional and spatial databases, the capac- 
ity of numerical models to simulate extreme events, and advances in multivariate 
spatial statistics methods, the relevance of a simple application of an extreme value 
distribution to a sample of observations for a given site, has become more limited. 
However, new approaches are invariably constructed from results that originated in 
extreme value theory. This, therefore, represents a body of theory and practical tech- 
niques that engineers responsible for characterizing random natural hazards need to 
know. 
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The use of this theory is of course not without critics. Working with a statistical 
distribution, necessarily estimated from a small number of observations, to extrap- 
olate beyond them even, poses a major epistemological problem [107]. From “how 
many observations” to what “extreme intensity” can we be reasonably confident in 
our extrapolation, even when accompanied by measures of uncertainty? Do we even 
have the “right” (is it even science?) to conduct such an analysis?* Proof of these 
legitimate doubts is the existence of other approaches to extreme risk modeling, gen- 
erally founded on deterministic principles, e.g., the probable maximum precipitation 
calculation, widespread in North America [803], or modeling the physics of the phe- 
nomenon using numerical methods (see Sect. 3.2.2). A quite comprehensive list of 
alternative approaches such as these can be found in [9]. 

Nevertheless, this book will provide theoretical and practical answers to the above- 
mentioned doubts. From an epistemological perspective, we agree with the viewpoint 
of Stuart Coles [173], who asserts that extreme value theory provides probably the 
only practical solution to the need for extrapolation based on rational, objective, and 
technical arguments: 

By definition, extreme values are scarce, meaning that estimates are often required 
for levels of a process that are much greater than have already been observed. This 
implies an extrapolation from observed levels to unobserved levels, and extreme value 
theory provides a class of models to enable such extrapolation. In lieu of an empirical 
or physical basis, asymptotic argument is used to generate the extreme value models. 
It is easy to be cynical about this strategy, arguing that the extrapolation of models 
to unseen levels requires a leap of faith, even if the models have an underlying 
asymptotic rationale. There is no simple defense against this criticism, except to say 
that applications demand extrapolation, and that it is better to use techniques that 
have a rationale of some sort. This argument is well understood and, notwithstanding 
objections to the general principle of extrapolation, there are no serious competitor 
models to those provided by extreme value theory ({173], p. vii). 

Extreme value theory provides a strict and precise framework, in order to be 
applicable. In particular, it is important to note that phenomena must be studied using 
continuous variables, and by defining orderable variables, i.e., quantities for which 
one can define an ordering for comparing and ranking purposes [107]. Furthermore, 
one is not allowed to provide a precise point estimate of an extreme value return level 
or very low probability without also providing one or several measures of uncertainty. 


4 In particular, questioning the actual relevance of the concept of probability in the extreme value 
setting is a legitimate line of enquiry [107]. 
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2.3 Objectives of the Book 


2.3.1 A Need for a Cross-Disciplinary Approach to Extreme 
Values 


A brief literature or internet search will quickly confirm to anyone the current po- 
pularity of extreme value theory. However, most of this content remains stuck in the 
world of academia [65, 173, 261], essentially developed by mathematicians, with a 
clear lack of critical analysis of its use in modeling and estimating real-world phe- 
nomena with real-world data. From a practical point of view, probability estimation 
methods for rare events have progressed in parallel in several scientific disciplines 
(finance [485], economics [47], hydrology [429], meteorology and climatology [23], 
maritime hydraulics [394], etc.), partly in cohesion with extreme value theory, partly 
by breaking away to answer needs related to each subject’s specific requirements 
[145]. This lack of results transfer from one domain to the next may lead, on the 
one hand, to consolidation of the use of certain methods in certain domains, whose 
theoretical backing continues to be refined over time. On the other hand, potentially 
useful and important cross-disciplinary research directions may unfortunately fall 
by the wayside. 

In response, researchers and engineers from EDF R&D have for several years 
come together in a research group on extreme value statistics, focusing on defining a 
good practice framework at this subject’s scientific and practical interface. This book 
is the direct result of thoughts and discussions emanating from the research group. We 
aim to provide an in-depth look at the common methodological heart of theoretical 
and applied subjects, via a critical analysis of methods used and results obtained. It 
is clear is that extreme value probability estimation studies for various disciplines 
broadly share the same underlying framework with, of course, specific steps related 
to the study in question. This cross-disciplinary point of view is a specific focus 
of the first part of the book, which presents the “classical” theory of extreme value 
statistics. 


2.3.2 Which Random Events? 


Natural hazards that may impact human activities and infrastructures include a wide 
range of environmental phenomena. Here are some examples: 


e in meteorology: hot and cold air phenomena, rain, extreme wind events and torna- 
dos, snow, hail, lightning; 

e in hydrology: flooding or baseflow* events, high and low water temperatures; 

e certain maritime phenomena: abnormally high and low sea levels, extreme tides 
due to meteorological processes, astronomical tides, waves (chop* and swell*), 
high and low water temperatures; 
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e in geology and seismotectonics: earthquakes, tsunamis, volcanic eruptions and 
landslides; 
e space weather: solar storms and meteorites outbreak. 


This list is inspired by regulations in the nuclear industry (e.g., [12, 15, 17, 19]), 
a particularly proactive and attentive actor in this area. However, the list is by no 
means exhaustive and only serves to give the reader an idea of phenomena that can 
be characterized using methods presented in the book. The applicability of tools 
based on this theory often demands simplification, with hazards represented by a 
limited number of variables that together vary with time. Such constraints do not 
prevent the theory being used to characterize many other natural phenomena, such 
as large wildfires [35, 724]. 


2.3.3 Answering to New Risk Perceptions 


The increasing complexity of risk perception requires the development of new the- 
oretical and practical aspects of extreme value statistics. These fall into four main 
categories, mostly treated in the second part of the book, and include recent research 
results, often the product of the work of EDF’s engineers and researchers. 


1. The cumulative effect of several hazards, the importance of which we mentioned 
earlier. 

2. Taking trends into account: If there is a trend, can it increase the impact of events 
today considered extreme? How can we, in this case, modify mitigation strategies 
to take this into account over relevant time intervals? 

3. Taking advantage of spatial data and other information sources, notably for refin- 
ing results from other studies and producing new ones. Demands for increasing 
geographic and temporal precision can limit the number of directly applicable data 
samples obtained from the past. In parallel, detection and measuring instruments 
are becoming increasingly important and increasingly precise. The refinement of 
old results and the production of new risk quantification methods thus requires 
the construction and deployment of a whole arsenal of modern techniques, based 
on the use of spatio-temporal data from the neighborhood of the site in ques- 
tion, and/or any specific site expertise available. The goal is to use these varied 
sources of information in the best way possible to confront potentially high levels 
of uncertainty that could result from these statistical predictions. 

4. Lastly, the use of stochastic and numerical models is an alternative approach that 
may prove crucial when too few extreme data values are available to ensure the 
adequacy of the previous approaches. This strategy involves seeking to “mimic” 
the natural phenomenon at the source of the hazard, be it through the use of 
empirical—statistical models, based on all available data (not only extreme val- 
ues), or using mechanistic models of the physics involved [767]. Simulation tech- 
niques can then be used to replicate the phenomenon a huge number of times, 
thus “detecting” extreme situations; the probability of occurrence can then be 
quantified. 
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2.4 Key Messages and Warnings for the Reader 


This book is aimed at a broad category of users of extreme statistics: engineers, 
researchers, professors, students. It aims to bring them, through a transversal state of 
the art at diérents professions and thanks to the practical experience of the authors, 
to adopt the practice and legitimate questioning of an intelligent customer faced with 
a concrete problem. 

The “quest” proposed to the reader, throughout this book, is indeed mainly oriented 
to provide answers to an engineer, confronted with basic questions of the type: What 
is the intensity of a hazard associated with a given return period? Starting from 
theoretical building blocks and the characteristics of observed samples, the book 
turns to questions about the temporal homogeneity of the phenomenon being looked 
at, then shows how to attempt to gradually improve the statistical significance of 
a sample by integrating various information sources. Questions of inference (the 
explanatory approach) then start to mingle with those of forecasting and extrapolation 
(the predictive approach). The last section of the book, devoted to a comprehensive 
look at examples from industry, will help readers put elements of theoretical and 
applied critical analysis from earlier parts of the book into practice. 

Ultimately, we think that this approach promotes learning, and raises—as we go 
along—legitimate questions about the appropriateness of each method, thus allowing 
us to develop a critical eye for all tools issued from extreme value theory. 


Warning 


The methods and tools described in this book are useful resources for the intended 
audience of engineers and researchers. An engineer has to be a critical analyst of 
methods sprouting from the general theory of extreme values, which are constantly 
evolving. The search for and understanding of data requires a delicate statistical 
dance, built on a necessary dialogue between engineers and experts. Similarly, selec- 
tion and then application of any statistical model that attempts to be consistent with 
observed natural hazard data, cannot occur without ongoing critical analysis of the 
model’s merits, especially when testing and validating it. Note that the aim of the 
book is not to give a comprehensive inventory of all existing extreme value methods 
in the literature; numerous references to highly specialized research results are given 
in the text, and curious readers are encouraged to follow these to their source. Instead, 
our underlying goal is to provide a general methodological framework, constantly 
accompanied by useful practical tips. 

Therefore, the book should not be looked at as scientific doctrine, to be followed 
to the letter. The actual design of protective structures—at EDF and elsewhere—is in 
reality conducted using several approaches to risk modeling, some of which will be 
specific to the geographic site and hazard being considered. Extreme value theory is 
a possible ingredient in this process, its advantage being its general and non-specific 
nature. For this reason, it is often called upon to play a complementary role, or used 
for verification purposes. 


Part I 
Standard Extreme Value Theory 


Chapter 3 A) 
Probabilistic Modeling and Statistical get 
Quantification of Natural Hazards 


Pietro Bernardara and Nicolas Bousquet 


Statistics are no substitute for judgment 


Henry Clay 


Abstract This chapter presents the main useful concepts of the statistical approach 
to extreme values, such as return periods or levels, as well as a general methodology 
for conducting a statistical study. It also introduces alternative approaches, statistical 
or not, that can be valuable in dealing with extreme natural hazards. 


3.1 The Rationale for a Statistical Approach to Extreme 
Values 


3.1.1 Practical Relevance and Scope 


3.1.1.1 Considering Natural Phenomena to Be Random 


The statistical approach to which this book is devoted is based on the fundamental 
assumption that extreme phenomena belong to a family of events possessing a certain 
stochastic regularity, and that observations x1,...,x,,... exist of these events.! A 
swollen river can be interpreted as an extreme water flow, significantly different 
from the usual one. However, the extreme phenomenon will still be characterized 
by the same underlying mechanisms that lead to the normal phenomenon (rainfall 


'We will use the generic word data in the rest of the book. 
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upstream, glacial melting, etc.). Statistical quantification then appears naturally 
from the following hypothesis. 


Hypothesis 3.1 The data x,,..., x, can be considered to be observed values of a 
random variable X. 


Extreme values can then be defined as maxima of samples of observed values 
of X (block maxima approach, or MAXB), or as observed values that exceed a cer- 
tain threshold (Peaks Over Threshold approach, or POT). More broadly in statistics, 
extreme values are extreme quantiles x, of X, characterized by probabilities of not 
exceeding such values, F = P(X < x,), that are very close to 1. The word signif- 
icantly, expressed above, has a precise meaning proposed by the classical statistics 
of Fisher [282] and Pearson [774], and recently deepened by Johnson [417]. 

From an engineer’s point of view, this hypothesis gives a practical way to model 
the phenomenon of interest, and therefore to answer questions of risk quantifica- 
tion. This consensual definition of randomness corresponds well with the nature of 
observations and predictions of typically considered phenomena (see Sect. 3.1.2 for 
details). Probability theory, whose framework is introduced in Sect. 4, is the best- 
adapted mathematical environment for handling random events [442] (Sect. 1.2: “The 
Relation to Experimental Data”) [623, 700]. The term natural hazard will be used 
extensively in this book to refer to such phenomena, both in terms of observations 
and predictions pertaining to them. 


3.1.1.2 Two Fundamental Theorems 


The two sampling approaches introduced above (MAXB and POT) are amenable 
to powerful results from extreme value theory, itself essentially built on two funda- 
mental theorems: the Fisher-Tippett [283] and Pickands [615] theorems. These two 
theoretical results lead to two statistical modeling methods for extreme values. 


The Fisher-Tippett theorem states that the probability distribution of a sample of 
maxima or minima drawn from various samples of independent and identically 
distributed (iid) variables converges to a family of distributions called generalized 
extreme value (GEV) distributions, under standard conditions [173]. 

Under similar conditions, Pickands’ theorem states that the probability distribution 
of a sample of values above (or below) a fixed extreme threshold u (excess sam- 
ple) converges to a family of distributions called generalized Pareto distributions 
(GPD). 


Techniques for sampling observations (MAXB and POT), and using the corre- 
sponding statistical distributions (GEV and GPD) studied in the literature and applied 
to real-world situations, have been chosen and developed based on these two theo- 
rems. Taking a simplistic point of view, an analogy can be made between them and 
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the central limit theorem (see Sect. 4.2.2), which establishes the convergence of the 
mean of a sample of random variables to a normal distribution. In particular, the first 
theorem above, seen from a simplistic point of view, establishes the convergence of 
the maxima of samples of random variables to the GEV family. 


Remark 3.1 Important properties also derive from these theorems, which reinforce 
their universality: the shape of the distribution tail is consistent between the two 
approaches, and the law of maxima obeys a fundamental principle known as max- 
stability: the maximum of a sample of maxima follows a law belonging to the same 
GEV family (see Definition 6.5 for a formalization of this notion). The dual principle 
is the stability of the GPD family when the threshold is modified for a higher value 
(see Proposition 6.3 and [588]), a property also called heredity. 


3.1.1.3 Frequency of Occurrence and Return Levels 


For phenomena described using time series, it is common usage to quantify the 
occurrence of an extreme hazard in terms of annual exceedance probability [260], 
denoted by p. This is defined as the probability that a given value of the relevant 
variable X, often denoted by x,, is exceeded in a given one-year period. The value 
x, is called the return level. 

The annual frequency of occurrence is a classical statistical estimator of p (Sect. 
4.2.2). For example, we can associate a river flow of 300 m3 /s with the value p = 0.01 
if we estimate that the probability of observing a flow of X > 300 m°/s over a one- 
year period is equal to 0.01. 

In the more general context of sampling according to a different frequency, the 
relationship between F(x,) and the annual exceedance probability p is more com- 
plex. The occurrences of exceedances are then considered independent and follow a 
homogeneous Poisson process (see Sect. 4.1.2.1 and Definition 4.2), and the average 
number À of events per year present in the sample must be taken into account. It 
comes to p = 1 — Fen): and consequently T = 1/(1 — F* (Xp)). In practice, the 
latter formula is often approximated by T = 1/A(1 — F(x,)) [667]. 


3.1.1.4 Return Periods 


The return period concept, more intuitive for engineers and decision-makers (and 
even the general public) and commonly used in regulation, was popularized by 
Gumbel [364] and is defined as follows. If the probability p of observing a flow 
of X > 300 m?/s in a one-year period is equal to 0.01, it follows that over a 1 000- 
year period, we can expect to see, on average, ten events where such a flow occurs. 
Ten times in | 000 years means on average once every 100 years. This is what we 
call the return period, which is given by the simple formula 


T =1/p. 
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Nevertheless, this concept must be used with caution. The formulation “on average 
once every 100 years” has often led neophytes in the field to believe that the extreme 
phenomenon in question should occur periodically every 100 years, which is of 
course false [661]. A more formal explanation of the return period concept is offered 
in Proposition 6.5. 


3.1.1.5 Typical Return Levels 


In practice, the dimensioning of industrial installations requires the assessment of 
extremely rare events, occurring on average once every 100, 1 000, or even 10 000 
years. The hazard is thus labeled (especially in hydrology) centennial, millennial, or 
decamillennial. Checking for the respect of regulations therefore requires estimating 
annual exceedance probabilities typically between 1072 and 1074. 

However, available data series for natural phenomena tend to go back 20—40 years 
only (see Sect. 5.2.2 for more details). It is thus frequent to try to estimate return 
levels that have never been observed, meaning attempts to extrapolate the available 
statistical information. In such cases, it becomes impossible to provide a “classical” 
validation of the modeling in which we compare real data with the model’s predictions 
[29]. For example, to validate a millennial rainfall model using observed data, we 
would require access to a sequence of several millennial observations, not to mention 
the fact that the validation would be weakened by intrinsic variability affecting this 
observed sample. The actual limits up to which extrapolation remains credible are 
still a subject of debate today [174, 484, 645]. From a methodological standpoint, 
our positioning is stated in the introduction of the book: extreme value theory is a 
useful and objective ingredient, but validating the acceptability of results remains 
the task of expert decision-makers. 


3.1.2 Phenomenological Relevance 


Most natural hazards are characterized by permanent uncertainty as to their future 
state [234]. For example, it is impossible to exactly predict tomorrow’s temperature at 
a given time and place, as it is a measurement of the thermal agitation of particles, i.e., 
a measurement of energy exchange at the heart of a thermodynamic system subject 
to an indeterminacy principle. Furthermore, the measurement necessarily involves a 
spatial and temporal approximation. Any sequence of temperature measurements or 
predictions thus has the following property: the “missing” information on the size 
of the error cannot be set to zero; there always remains an incompressible margin 
of uncertainty that has, by definition, the properties of a random phenomenon [150, 
212, 240]. This is also characteristic of phenomena such as rain and waves. 
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The physics of water flow, on the other hand, might seem at first glance to be 
deterministic: if all parameters are known, the phenomenon’s behavior is entirely 
predictable, and a probabilistic framework would seem misplaced. However, this 
does not take into account our ignorance as to the exact parameter values, nor whether 
they indeed entirely characterize the situation. Are we really capable of completely 
characterizing a natural phenomenon by a set of parameters and structural equations? 
If so, are we capable of absolute precision, in terms of what we know about them? 
Epistemic uncertainty, due to missing knowledge, characterizes such problems. It 
is typically distinguished from random uncertainty, by considering that it can be 
reduced by adding knowledge, whereas random uncertainty cannot [216, 386, 671]. 
However, the notion of knowledge addition is conditional on the nature of the tools 
used to represent this knowledge and translate it into information. In contemporary 
scientific practice, the addition of knowledge is most often achieved by improving 
models [109, 767]. These necessarily suffer from a form of error that is linked to the 
deterministic and discretized way they are computationally implemented (see, for 
example, [390] and [240], Sect. 4.2). In other words, when characterizing complex 
phenomena, there always remains an irreducible error whose features, again, are 
those of a random phenomenon.” 

Besides the practical relevance discussed in Sect. 3.1.1, these considerations offer 
an epistemological justification for the use of probability distributions (Sect. 4.1.2) 
and processes (Sect. 4.1.5) to represent natural phenomena in a simplified way, as 
well as the use of statistical techniques to estimate these distributions and processes 
and make predictions and extrapolations. 


3.1.3 How Should We Conduct a Standard Study? 


The work undertaken to characterize good practice for extreme value statistics stud- 
ies within EDF revealed six fundamental steps that should be followed. These can 
be summarized by the following keypoints: 


1. Defining variables and probability levels. Here, we try to define the problem to 
be addressed. Which rare event do we wish to model? Which variable z can be used 
to model it? (the variable of interest.) Is this choice dictated by regulations? Is it 
accessible to measurement or simulation (x = z), or do we need to study another 
variable x Æ z ? (a study variable.) What is the probability level to estimate? Does 
it depend on current regulations, and should the evolution of these be expected? 
Is this level scientifically acceptable, as discussed in Introduction (Sect. 1.2)? If 
several hazards are simultaneous, then what do we want to estimate? (because 
here, the very meaning of probability level needs to be defined.) Similarly, if there 


2 However, many other theories of uncertainty can usefully, and sometimes more appropriately, be 
proposed to represent epistemic uncertainty, as possibility theory [231] and fuzzy logic [195]. 
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is a trend in the phenomenon, the return level concept is no longer pertinent; what 
then should we quantify? 


Example 3.1 (Eurocodes) 


The study variable x (temperature, precipitation, etc.) is sometimes similar 
to the variable of interest z defined by regulations or norms; this is the case, 
for example, in the Eurocodes, a set of European norms which standardize 
methods of calculation, and are used to verify the stability and dimensioning 
of buildings and civil engineering structures. See http://normalisation.afnor. 
org/thematiques/eurocodes for more details. 


Since it is the decision variable related to a project, we may need to work with a 
study variable x that is connected to the variable of interest by a relationship such 
as 

Z = f(x, d), 


where d is a vector of deterministic parameters, and the function f may be a deter- 
ministic or stochastic physical model [660], or a probabilistic model of connected 
or combined hazards. 


Example 3.2 (Dimensioning of dikes) 


In a design study for dikes protecting a riverside power station, the variable of 
interest or project variable is the height (level) of the river. Nevertheless, the 
extreme value statistics study may be conducted on the river flow. The river 
flow is more conducive to modeling than the height [10], and for this reason, 
this approach is also recommended in regulations [1]. The mapping between 
the study variable (river flow) and the variable of interest (river height) is then 
performed using a calibration curve or hydraulic modeling (see Example 4.7). 
Another transformation of the variable has to be made if hydraulic structures 
at the site can influence the flow (known as the natural flow). In this case, 
it is necessary to reconstruct the sequence of natural flows that would be 
observed if no hydraulic structures were present. This procedure is known as 
flow naturalization [10]. 


w 
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3 Probabilistic Modeling and Statistical Quantification of Natural Hazards 


Example 3.3 (Sticky snow) 


Sticky snow is a type of snow which can lead to accretion* on electric power 
cables. The variable of interest is the value of the load on a cable (which could 
lead to rupture), which can be indirectly determined from meteorological data 
[2141]. 


2. Data collection. This is a crucial step in any statistical study, and particularly 
for extreme value statistics, where available data is rare. What data is available 
for the study? Is it directly related to quantifying the hazard? Is the information 
it contains reliable? Sufficient? 

2. Sampling extreme values. Extreme value theory allows for two main types of 
sampling: block maxima (periods) (MAXB approach) or threshold exceeding 
(POT approach). Which is the most pertinent with respect to the amount and 
type of data available? When two or more hazards are related, how to adapt the 
sampling? 

3. Modeling and statistical estimation. Observed data is then modeled using a 
statistical distribution—either of type GEV or GPD—which is appropriate to the 
extreme data in question and coherent with the choice of sampling made above. 

4. Testing and visual checks. The chosen distribution can be partly validated using 
statistical tests and plots. 

5. Calculation of results and validation. In this final step, estimation of rare prob- 
abilities and return levels (or functions of interest specific to the case of joint 
hazards or the existence of a trend) are carried out using the chosen distribution, 
and results are interpreted according to know-how and expertise in each domain. 


Extreme value sampling, estimation, testing, and graphical verification are clas- 
sical steps when applying extreme value theory, and for each of which it is possible 
to draw on a wide range of methods. On the contrary, in the definition of variables 
and probability levels (1), in data collection (2), and in the calculation of results and 
validation (6), the need for specific expertise may arise. Within these three stages 
(1-2-6), it is therefore particularly important to take a critical look at the strict rele- 
vance of extreme value theory. It is in these three stages in particular that the practical 
experience of EDF’s engineers and researchers bring a certain originality to this book. 
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3.2 Alternative Approaches 


3.2.1 Alternative Statistical Approaches 


The classical hypotheses of the extreme value theory are hard to be fully verified 
for numerous natural hazards, so an engineer’s work consists in getting as close as 
possible to them. Because there remains the possibility of error in the historical and 
current use of these hypotheses in each domain, exemptions to the strict application of 
extreme value statistical modeling exist and other distributions are regularly used by 
engineers instead of extreme distributions (exponential, lognormal [452], generalized 
exponential [491], or mixtures of simple laws [587]), which can be estimated from 
other types of sampling. In this book, deviations from the theory will be clearly 
signaled as such. These historical and even current practices may also be the result 
of adaptations of the theory of extremes. The Gumbel distribution, for example, 
which is a special case of the GEV distribution, is still widely used by hydrologists 
(see Remark 6.5). 


3.2.2 Stochastic Scenario Approaches 


Other approaches, which may be termed stochastic, differ from statistical approaches 
because they seek to model the behavior of the phenomenon giving rise to the extreme 
hazard. These approaches are considered further in Chap. 10. This modeling strategy 
may be based on an empirical examination of all observations of the phenomenon 
(not just extreme ones) and a choice of a random process, possibly stationary (Sect. 
4.1.5). Such approaches make it possible to re-simulate the empirical behavior of the 
phenomenon, possibly with the help of covariates, and naturally produce extreme 
events. 

A more physics-based approach can also be developed using deterministic mech- 
anistic models, implemented as code. EDF has developed, for example, tools for 
simulating the physics of the transformation of rainfall into flow rates (such as the 
MORDOR code for hydrology [306]), and from flow rates to water height (the MAS- 
CARET [351] and TELEMAC [301] hydraulics codes). 

These deterministic tools can then be associated with Monte Carlo stochastic 
simulations, as in EDF’s SCHADEX method [586], for simulating long series of 
“artificial” observations whose statistical properties can be analyzed. 


Chapter 4 (M) 
Fundamental Concepts of Probability gs 
and Statistics 


Nicolas Bousquet 


Abstract This chapter recalls the fundamental concepts and results of probability 
theory and statistical theory. These are essential to understand and use the tools 
of extreme value theory in an appropriate way. From the notions of randomness, 
probability distributions to random processes, from classical estimators to regression 
models, this chapter aims to facilitate the technical understanding of the rest of this 
book. 


4.1 Main Concepts and Notions 


Probabilistic modeling of a hazard X rests on the random variable character given to 
X, evolving in a d-dimensional sampling set Q. Since Q 4 Ø, subsets of values H C 
Q that X can take are non-empty, and come with a certain stability: the countable 
union of a number of .&% is still in Q, as is the complement of any subset 7. 

These fundamental properties allow us to “pave over” (measure) the set Q so that 
any observation (occurrence) of an event A € .# can be associated with a numerical 
value P(A). The set of these values is contained in the interval [0, 1], and we have 
that 


P(Q) = 1. 


P is therefore called a probability measure. 

In probability theory, the triplet (Q, 7, P) is called a probability space, where 
the set Q is the sample space, and & is a o-algebra. In general, & is chosen to 
be the set of subsets of Q which are Lebesgue measurable (see Sect. 4.1.1), but it is 
usually not defined explicitly in applied problems. 
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4.1.1 One-Dimensional Problems 


Consider to begin with the case where d = 1. If is a discrete set (for example, 
Q = {1,2,3,...,}) or a categorical one (for example, Q = {A1, A2,..., }), where 
each A; defines a class of values of X that are disjoint from all other A ;, and more 
generally if Q is countable, the probability distribution is said to be discrete, and is 
determined by the probabilistic mass function 


f(x) = P(X =x) 


for each x € Q. However, the great majority of random variables encountered in this 
book will be continuous. In particular, the values of X (wind speed, temperature, flow 
rate, etc.) are continuous, which is an essential condition for applying extreme value 
theory, and Q is generally a continuous subset of JR“, even if measuring devices 
are limited to a certain accuracy in practice. This accuracy does not play a role 
in the construction of the probabilistic model but instead, in this, of the statistical 
model, which encompasses the probabilistic model by establishing a direct link with 
noisy observations (see Sect. 4.1.6). In practice, the models are identical if the noise 
affecting the data can be considered negligible. 

In the continuous case, i.e., when Q is no longer countable, the probability distri- 
bution can be defined via the cumulative distribution function 


Fx(x) = P(X < x) 


for each value x € Q. In order to satisfy the probability axioms [442], this function 
must be increasing, and for d = 1, 


lim Fx(x) = 0 
X— Xing 

lim Fx(x) = 1, 
X— Xsup 


where (Xinf, Xsup) are the upper and lower bounds (perhaps infinite) of Q. The multi- 
dimensional case (d > 1) is examined in Sect. 4.1.4. Returning to d = 1, an approx- 
imate “equivalent” to the discrete probability f(x) in the continuous case is given 
by the probability that X is between x — a and x + b (with a, b > 0): 
P(x-a<X<x+b) = Fy(x +b) — Fx(x — a). 


This property pushes us to define, in cases where Fy is differentiable, the derivative 
of Fy as the limiting case a = b = £ — 0, 


_ dFx 
fx (x) = y © 


which is called the probability density of X, and therefore satisfies 
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Fig. 4.1 Left: Example of a cumulative distribution function. Right: Frequency histogram of X 
and the corresponding probability density (curve) 


F(x) = f | fx) du 


and 
x+b 


Pir-asXsx+b)= | fx(u) du. 


X—a 


Necessarily, Ío fx(u) du = 1. Thus, any continuous probability distribution in 
dimension d = 1 (though this is also true for d > 1) can be written interchangeably 
Gf differentiable!) in terms of its cumulative distribution function or its density 
(Fig. 4.1). 

Informally, fx can be seen as the limit of a histogram giving the frequencies 
of possible values of X in smaller and smaller classes (Fig. 4.1). More formally, 
the cumulative distribution function and probability density need to be seen as tools 
useful for taking a measure of the distribution of X with respect to that of the space Q. 

Suppose, for example, that Q = 7, x h x ... x 14, where each J; is an interval 
of IR (closed, open, and half-open). It is therefore a parallelepiped containing all 
observable values of X. This solid (or space) can be characterized by several mea- 
sures, such as, for example, its volume. The Lebesgue measure [468], written as uz, 
was constructed to be a reference measure that allows the description of these types 
of spaces in a universal and uniform way. Like for volumes, it has a finite value if Q 
is compact. The density fy defines another measure on Q which specifies the shape 
of the distribution of X and allows us to distinguish it from uniformity. One should 
therefore interpret it as a relative measure with respect to the Lebesgue one (we also 
say: dominated by the Lebesgue measure). For readers interested in a detailed intro- 
duction to measure theory, we suggest the book [106] (for an engineer’s approach) 
and [459] (for a more mathematical one). 


' And more generally, differentiable in d > 1 dimensions. 
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Information contained in probability distributions can often be summarized using 
what are called the kth-order moments M}, k € IN, defined as the means of the 
variables X*: 


M; = EX = [pw dx. 
a 


If these exist for k = 1 and k = 2, we can define the expectation E[X], and variance 


VIX] = | (x — E[X])? fx (x) dx. 


The expectation gives us a measure of the mean of X with respect to its distribution 
fx, while V[X] is a measure of the variation in (or dispersion of) fy. The standard 
deviation of X is defined by 

ox = y VIX]. 


Alternatively, the coefficient of variation 


provides another relative measure of the variability in or dispersion of fx (and is 
especially used by engineers). Lastly, we will say that a variable X is standardized 
if it has been centered (by subtracting its expectation), and divided by its standard 
deviation: 


X — E[X 
X' = LS (4.1) 
OX 


thus transformed into a variable X’ with mean 0 and variance 1. 


4.1.2 Families of Parametric Models 


Let us now recall several fundamental probabilistic and statistical models which will 
appear often in more complex situations in this book. In our treatment, we are going 
to focus on parametric models, i.e., ones that can be entirely characterized by a finite 
set of parameters. 

The first reason for this choice is connected to the framework of our studies. 
Indeed, the extreme value behavior of random samples follows, under certain the- 
oretical conditions, parametric distributions (see Chap. 6). The same is true for the 
behavior of statistical estimators (Sect. 4.2.2), which obey (parametric) laws of large 
numbers. 

This underlying argument is consolidated by the following observation: when 
interested in extreme-valued events, the number of available data may be very low. 
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Attempting to describe the mechanism creating such events using a random process 
characterized by an infinite (or merely large) number of parameters (i.e., more than 
the number of observations) is rather impractical, since most of the parameters will 
have to remain unknown, or have several possible values. The final model would be 
of little relevance. 

In the rest of the book, in its most general form, y will indicate this vector 
of parameters, living in a finite-dimensional space. The conditioning by y on the 
random mechanism producing events will be made clear in the notation for densities 
and cumulative distribution functions: f(x) = f(x|w) and F(x) = F(x|wW). 


4.1.2.1 Distributions 


In the discrete case, we may be interested in the occurrence of an event of the type 
Z > zo, where Z is, for example, the maximum monthly water level, and zo the height 
of a protective dike. Suppose that we have a sample of indicator values (6), ...,5,) € 
{0, 1}” equal to 1 if the event occurs, and 0 otherwise. Under the hypothesis that the ô; 
are independent and each corresponds to a submersion test occurring with the same 
probability p, and denoting X, = ÿ *_, ô; as the total number of “successes” from 
n trials, the probability (mass) distribution of X, is 


f(x) = (") papy 


for x € Q = {0,1,2,...,n}, where 


n = n! 
(") ~ x(n x)! 


The random variable X, is then said to follow the binomial distribution B(n, p) 
(Fig. 4.2). In the case of a risk-quantification study, we would aim to estimate the 
overflow probability p from the observed statistic x,. 

The variable X, defined above, known as a count variable, can be generalized 
with the intention of estimating the occurrence of random events during a fixed time 
period (e.g., a year). If we suppose that an event happens with a mean frequency of 
À > Oin this time period, then the probability that it happens exactly X, = x € Q = 
{0, 1,2,..., co} times is 


Pa 
f(x) = = expa), 
X: 


which is none other than the probability mass function of the Poisson distribution with 
mean À (Fig.4.2). This distribution plays an important role notably in establishing 
statistical distributions associated with historical observations, since it allows us to 
model the arrival of a number of events between two time points (e.g., several dozen 
years apart) that have not been directly observed. The technical connection between 
the binomial and Poisson distributions can be stated as in the following lemma: 
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density 
density 


Fig. 4.2 Top: mass function of the discrete binomial distribution 6, (100, 0.1) (circles) and the 
Poisson distribution P (15) (triangles). Down: continuous probability distribution of the standardized 
normal distribution M (0, 1) (solid line) and a x? distribution (dashed line) 


Lemma 4.1 If X, follows a binomial distribution B(n, p) with p 1, then it can 
be approximated by a Poisson distribution with mean np as n — ©. 


Recall furthermore—in the continuous case—the fundamental importance of the 
normal distribution X ~ N (u, o?) which has, when d = 1 and Q = IR, mean H 
and variance o°: 


1 1 
fx (x) = de. exp ( z w?) 


It can be used to model a great number of phenomena, in particular the distribution 
of the mean of a random sample (law of large numbers). The variable (X — u)/o 
follows what is called a standardized normal distribution N (0, 1) (Fig. 4.2). Usu- 
ally, we denote by @(.) and ®(.) the density and cumulative distribution function, 
respectively, of this standardized variable. The convergence of statistics to normal 
distributions (see Sect. 4.2) is aclassical result in the literature (central limit theorem). 


4.1.3 Statistical Testing 


The general approach of statistical testing consists in rejecting or not rejecting (with- 
out necessarily accepting) a so-called null statistical hypothesis Hp given a set of data 
Xn. For example, in a parametric framework this assumption can correspond to the 
specific choice of a value y = Wo in the same family f(x|y) or a domain Y € Wo. 
Defining a test is defining a statistic 
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Ra = R(X, ces Xn) 


which is arandom variable, whose distribution Fr, is known (at least asymptotically, 
i.e., when n — co) when the Ho hypothesis is true, and independent of the hypothesis 
value (ex: independent of y). More precisely, in a parametric framework when W is 
tested, Fr, must not depend on y, and the variable R, is called pivotal. When R, 
is defined independently of y, this statistic is called ancillary. 

The localization of the observed statistic r, = r(x1,..., Xn) with respect to Fr, 
Fisher (1926; [282]) as the probability p,, (called p-value) of observing an event 
more “extreme” (smaller or larger) than r,. The lower this probability, the further 
away the event r, is from the highest density R, values, and the less likely Ho is 
(remember that the p-value is not the probability that Ho is true). In other words, if 
H is false, then r,, should be an extreme value of Fe, . 

The current approach to testing, known as the Neyman—Pearson approach (1928; 
[474]), requires setting a threshold of significance a « 1 defining extremality and 
comparing the quantile gi, of Fr, with p,,; if Pr, < dia, event r, is even less 
likely than a, and Hp must be rejected. Otherwise, this assumption is plausible (but 
not necessarily validated). The common practice in all experimental sciences, again, 
is to seta = 5% ora = 1%, but these arbitrary thresholds are increasingly criticized 
[258, 565], and it is currently recommended [73, 417] to conduct several tests and 
test very low thresholds (e.g., a € [1%o, 5%o]). 

In many cases, R, is chosen positive, so that the p-value p,, = P(R, > r,) canbe 
simply defined. This is the case with the Kolmogorov—Smirnov test, which is widely 
used to test the validity of the exponential law on waiting times between two cluster 
maxima (see Sect. 6.5.2.1). 


Example 4.4 (Kolmogorov—Smirnov test [723]) 


With the classical empirical estimator (cf. Sect. 4.2.2) x + F, (x) of the cumu- 
lative distribution function F of a unidimensional iid sample x), ..., Xn, 
defined by 


A TE 
ÊG ==) Hx <x}, 
IS 


and given a candidate Fo for F, itis required to test the hypothesis Ho : F = Fo. 
The test statistic is defined by 


R, = /n sup | F,(x) — Fo(x)| 
xeR 


Under Ho and for a large n, R, approximately follows the Kolmogorov distri- 
bution, with cumulative distribution function 
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O0 
Fks(x) = 1-2 )°(-1)**! exp (—2k7x?) for x € IR*, 
k=1 


which is generally tabulated within traditional software tools. 


For an important class, of so-called x? (Chi-squared) tests, R, is designed to 
follow a x? distribution with q > 1 degrees of freedom 


Ry, 


whose density function is displayed on Fig. 4.2 for q = 1. x? distributions are intrin- 
sically linked to normal distributions through a quadratic relation. For instance, the 
sum of the squares of n independent V(0, 1) random variables follows a x? distri- 
bution with n degrees of freedom. The quantiles of the latter are provided in practice 
by specific tables or algorithms. 


Test power. Different procedures testing the same property are not necessarily all 
equally relevant. One way to compare them is by their statistical power, i.e., their 
respective probability of rejecting the null hypothesis Hp when it is indeed false. 
When we use statistical tests, we always want to have high power, or at least the 
highest possible. Statistical power is defined as 


1— B, 


where £ is the rate of type II error: the probability of incorrectly accepting Ho. 
This rate is equivalent to the false positive rate in detection procedures. A classical 
example of the most powerful test for Ho: P = Po against Hı: P = Pı, where Po 
is a submodel of P, (which keeps certain parameters of P; fixed), is the maximum 
likelihood test (Neyman—Pearson’s theorem). It is also called the LRT test (likelihood 
ratio test). 


Example 4.5 (Adequacy x? test (discrete case) [784]) 


Let Xn = (%1,...,X,) be a sample of iid realizations of X, taking values in 
{1,..., M}. Itis required to test the null hypothesis Hp according to which the 
probabilities that X takes the value 1 to M are, respectively, pi,..., pm with 
371, pe = 1. Denote 


~ le 
D DE 
j=l 


where ôx,-x = 1 if x; = k and 0 else. Then 
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x (Pe—- pe) 
R, = ný = (4.2) 
k=l Pk 


follows, under Hp, the Nat distribution. 


Theorem 4.1 (LRT test (likelihood ratio test)) Let Xn = (X1, ..., Xn) be a sam- 
ple of iid random variables with common distribution P with density function 
f. It is required to test Ho: P = Po versus H;: P = P,. Denote by L;(Xn) = 
a F(X) the maximized statistical likelihood under Hypothesis H;, with € {0, 1} 
(see Sect.4.2.2.3 for a detailed definition of the likelihood and its maximization). 
Consider 
L\ (Xn) 


R, = 2 log p 
Lo(Xn) 


Then, if Po designates a parametric model, with parameter yy such that Y € Wo, 
and P; is specified by Y ¢ Vo, then R, asymptotically follows a mixture of Dirac 
measures and x? distributions whose degrees of freedom are equal to or lower than 
the number of constraints q imposed by the null hypothesis. 


Many details on the mechanisms, specifications, and warnings on the interpre- 
tation of statistical tests (parametric, nonparametric, compliance, adequacy, homo- 
geneity, independence, association tests, etc.) are provided in [207] and [356]. The 
specific case of LRT tests is particularly detailed in [349], Chap. 21. Applied to the 
specific case of extreme models, the reader interested in a general review may consult 
article [555]. 


Example 4.6 (LRT test) 


In the specific case where Wo is in the strict interior of Y, then 


n— oo 


R (4.3) 


Consider a normal distribution N (u, o) with Y = (u, o) € IR x IR}. It is 
required to test Hp : y = 0 versus H; : u Æ 0. A single constraint differenti- 
ates the two hypotheses, and 0 € JR. Then q = 1 and the result (4.3) applies. 
If it is required to test Ho : u = 0 versus H; : u > 0, the domain W is then 
restricted to IRt x IR}, and (4.3) must be replaced by 


n> 1 1 
Ro e 
2% t 5X 
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4.1.4 Multidimensional Cases 


The study of joint hazards (Chap. 9) requires a generalization of the main concepts 
and ideas presented in Sect.4.1.1. Let X = (x1,..., xa)’ be the vector of hazards 
under consideration. The joint cumulative distribution function is defined by 


Fy (x) = P(X, <x1,..., Xa < Xa), 


where x = (x1,...,xa). If the X; are continuous random variables and Fy is differ- 
entiable, the joint probability density is given by 


ə! Fx 
fx) = nn T 


Thus, for any set. C Q C IRA, 
P(XE 7) — f fx (u) du. 
A 
In particular, if Q = IR$, 


Fei = f | fx(u) du; ...duy. 


Each marginal density characterizing X; independent of the other variables can be 
obtained by integrating them out. If Q = Qi, Q;, then 


fx, (i) = II Sx (Uy, ..., ui, Xi, Ui41,-.., Ua) du, ...dug. 
@ 2, 
iyi 


The notion of covariance helps us to summarize the pairwise dependencies between 
the X;: 


Cor X= ff (oi — BEGD (x = BLX))) fex six) did, 


where E[X;] is the marginal expectation of X; and fx; x, the bivariate joint density 
of X; and X}, defined as the marginal: 


Ix;.x; i, Xj) =) fxC.., ui, Xin oy Uj, XY, ..., Ug) du, ... duq. 
© MX 


k#i,j 
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Covariance generalizes the notion of variance: Cov(X;, X;) = V[X;] (variance of 
the marginal distribution of X;). In practice, a multivariate distribution is often sum- 
marized in terms of its vector of expectations E[X] = (E[X,],..., E[X4])? and 
variance—covariance matrix 


È = (Cov(X;, X;));,;, 
or correlation matrix X' = (;,;);,;, defined by 


Cov(X i> X p 
Pi = XIV 
Each p; j is between —1 and 1, giving information on linear dependency between 
variables X; and X ;. Nonetheless, this summary is in general rather incomplete. For 
example, if there is independence between X; and X ;, then Cov(X;, X;) = 0, but the 
reverse is not necessarily true. The correlation matrix £’ completely characterizes 
the dependency structure only in very specific cases, notably when X is a Gaussian 
vector, and in general does not provide an entirely relevant measure of dependency. 
It is therefore important to fight the well-established practice of exaggerating this 
indicator’s importance [791]. In Chap.9, we describe what is meant by exhaustive 
information on the dependency structure, and provide more useful tools for dealing 
with multivariate distributions in extreme value theory. 


4.1.5 Random Processes and Stationarity 


The probability distributions found in this book are special cases of discrete-time? 
random processes (or stochastic processes), which describe the general behavior 
of sequences of random variables X,,..., Xn. These variables do not have to be 
independent and identically distributed (iid). The distribution fx, for each X; can be 
different for each i. However, the X; may not be independent of each other while 
still having similar distributions. In such cases, the process is said to be stationary. 


Definition 4.1 A random process X,,..., X, is stationary if for any set of integers 
{k1,..., ks} and integer m, the joint probability distributions of (X4,,..., Xx,) and 
(Xki+m>» -< -» Xk,+m) are identical. 


This definition allows us, in general, to characterize environmental time series data 
more appropriately than with hypotheses, thus better taking into account a certain 
inertia often found in natural phenomena. In effect, physical conditions leading to 
extreme phenomena tend to persist over time. For instance, a high-water event can 
be caused by several successive days of intense rainfall, and heatwaves caused by 


? Continuous-time random processes are not treated in this book. 
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high temperatures maintained over fairly long time periods. Extreme events therefore 
have a tendency to occur in groups. 

The stationarity hypothesis provides us with a technical aid, which we will take 
advantage of in the book. Nevertheless, it relies on the supposition that the depen- 
dency structure between X; and X;_1, Xx-2,...is invariant after m iterations. Thus, 
the existence of trends and seasonal cycles will invalidate this underlying hypothe- 
sis, meaning that data selection and homogenization must be performed before any 
modeling can occur (Chap. 5). 

The stochastic mechanism leading to the random process sometimes needs to be 
specified. This is true in particular when we are looking at whether the process con- 
verges to a stationary one as n increases. Suppose, for instance, that X1,...,X,,... 
represents temperature measurements at very close time points, and that it is wished 
to select ranges of indicators corresponding to stabilized temperatures. To do this, we 
need to provide the probability distribution of X; conditional on X£-1, Xx-2,..., X1, 
and to use a Markov chain representation. 


Definition 4.2 A random process X),..., Xn, ... is a Markov chain of order r € 
IN* if for any i >r, 


P(X; |Xi-1,..-, X1) = P(Xi|Xi-1,..., Xizr). 


If furthermore r = 1 and this transition probability is independent of i, the process 
is said to be homogeneous. 


Markov chains of order 1 are the easiest to work with, and provide an important 
generalization to the framework (an example is shown in Fig. 4.3). They also play an 
important role in inference and sampling situations. Indeed, processes y1, ..., Yn can 
be constructed as a way to explore a space Y, for instance, in a Bayesian framework 
(Chap. 11), and the exploration mechanism is often constructed using a Markov chain 
of order 1 which is known to converge to a stationary limit process,’ whose properties 
(expectation, variance, etc.) can be estimated. We recommend the textbook [652] to 
readers who wish to study this theoretical probabilistic framework further. 


4.1.6 Probabilistic and Statistical Modelings 


The terms probabilistic modeling and statistical modeling are often mixed together, 
in particular in the engineering literature. They do, however, mean different things. 
A probabilistic model, defined by its probability density fx, is intended to represent 
in this book the real physical phenomenon: 


X~ frs 


3 We also say stationary distribution. 
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Fig. 4.3 Example of a non-stationary Markov chain of order 1 


while the statistical model is meant to represent the fact that one or several observa- 
tions of X, denoted by x*, are linked to real examples x of X via a measuring device; 
for example, 


x*=x+e, (4.4) 


where € is noise coming from an imprecise measurement, often supposed random 
with a Gaussian distribution. We denote by f; its density, which is generally known“. 
Knowledge of the relationship (4.4) allows us to characterize the probability distri- 
bution of the random variable X* with samples x* as a convolution, whose density 


fe given by 
few) = | fu yf) ay, 


which can be used to determine the statistical likelihood of an observation x* (see 
Sect. 4.2). 

Nevertheless, it is essential for determining fx, the main goal of the study, that 
the distribution characterizes the greater part of variation in the observed data (from 
X*). Often, we will suppose that the influence of € is negligible, which is the same 
as saying 


4 Notably due to specifications provided by the constructor of the measuring device in question, or 
repeated testing under controlled conditions. 
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feu) > fi), Vue Q, 


meaning that the probabilistic and statistical models are the same. This hypothesis is 
not always satisfied in practice, however, especially for historical data. Measurement 
noise can in fact be quite large, for several reasons: 


e if measurements are not taken directly (for example, historical torrential rain 
records can be reconstructed using stratigraphic studies [299]); 

e poor measurement precision (for example, a flood dating to the Middle Ages, writ- 
ten about qualitatively—it washed away a bridge, covered fields, etc., or quantita- 
tively but with a lot of uncertainty—marks on a wall of a now-demolished house, 
etc.) [171, 601]; 

e measurements perturbed by an unknown bias linked to a poorly calibrated mea- 
suring device (or one damaged by the hazard itself, especially in extreme situa- 
tions) [4]. 


Even recent and today’s measurements can be disrupted by potentially high noise 
when they are derived from calculations—rather than direct measurements—subject 
to various sources of uncertainty (see also Sect.5.1). 


Example 4.7 (Calibration curves in hydraulics) 


Measurements of river flow are usually calculated by mapping from the directly 
measured river height, using a height-flow curve called a calibration curve 
[11]. This curve is in general established from gauging assessments, performed 
in relatively smooth flow conditions under certain physical hypotheses [387], 
and suffers from increasing uncertainty [165, 218, 529] when environmental 
conditions change (e.g., in cases of high or low water levels). Broadly, we can 
suppose that flow measurements taken at the extremes have higher measure- 
ment error than those taken in normal conditions. 


4.2 Bounding the Modeling Error 


4.2.1 Convergence of Models 


The reliability of statistical models depends on an approximation of reality whose 
error can be bounded under certain technical conditions. There are two types of 
approximation in this book: 


1. approximation of the unknown behavior of a supposed random quantity X (for 
example, the maximum of a sample over a given time interval) by a theoreti- 
cal one (for example, coming from extreme value statistics theory) that allows 
quantification and extrapolation; 
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2. approximation of a theoretical probabilistic model by an estimated statistical 
model, in the sense that the theoretical model involves a priori unknown param- 
eters Y which will be estimated using real observed data. As these observations 
X1,...,X, are considered drawn from a random variable X, the estimated param- 
eters are also treated as drawn from a random variable Wns known as a Statistical 
estimator. 


The sequence OAF therefore represents an initial stochastic process which we would 
like to use to estimate w (fixed but unknown). The set of random variables (X,), 
produced by the estimated model represents a second stochastic process, which we 
would like to use to approximate the real behavior X (random variable with unknown 
distribution). 

It is therefore essential to make sure that these two types of approximations do 
not prevent the estimated statistical model—the practical tool here—providing a 
meaningful diagnosis in terms of being able to reproduce data as seen in reality, 
nor interfere with its use in predictive studies. An absolutely necessary condition 
is to have convergence between reality and the theoretical model, then between the 
theoretical model and the estimated one. Convergence is expressed in the form of 
a difference between the models which must necessarily decrease as the quantity 
of information (number of observations n) increases, disappearing completely when 
n — œ. 

In the probabilistic setting, this difference is random, so it is possible that it is 
zero except for a number k of given situations that forms a subset of the event space 
Q with zero measure. Typically, this subset may be made up of a finite number of 
point values, or elements at the border of (2. In effect, in the continuous case we 
know that, under independence, 


P(X € {xx = DPX = xi), 


i=1 


and that P(X = x;) = 0 for all x; (as X is known). In this particular situation, we 
use the term: almost surely zero. 

The notion of almost sure convergence follows almost automatically. It involves 
verifying that the probability that the limit of a stochastic process Wn (or Xn) is the 
target y (or X) equals 1. For X,, this is written as 


X, 2 X © P (lim X, =X) =1. 

n— 00 

Another way to say the same thing is that the difference between the limit of the 
process and y (or X) is almost surely zero. This notion of convergence is the strongest 
and most useful in practice to show that a random process does indeed converge to 
a random variable, which could be simply a vector (or number). The term strong 
consistency is also used to refer to this idea. 
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Other weaker definitions of convergence—in the sense that they are implied by 
almost sure convergence but the reverse is not necessarily true—are also frequently 
used: 


1. Convergence in probability: 
X, = X SVe>0, lim P(X, — X| >€) =0. 
n— 00 


This plays a major role in many proofs for convergence in distribution (see 
below), and also implies almost sure convergence of a subsequence of (X,),: it 
allows X, to deviate from X, but less and less significantly so as n increases. 
2. Convergence in distribution, or in law, which is implied by convergence in 
probability, and corresponds to pointwise convergence? in the probabilist world: 


X, 5X © lim F,(x) = F(x), 
n-oo 


where (F,, F) are the cumulative distribution functions of X, and X, respec- 
tively, for any x where F is continuous. This notion of convergence does not 
characterize the values of the stochastic process, but uniquely its random behav- 
ior, i.e., whether that of X, resembles a greater and greater degree that of X. 
Here, we say we have weak consistency. 


Other notions of convergence (in L? norm in particular) are also commonly used. 
Often, these can help to ensure useful deterministic convergence, of expectations 
(moments) for instance, as shown in the following two theorems. 


Theorem 4.2 Suppose that X, converges in L! norm to X in Q € RR, i.e., 
lim E[||X, — X||'] = 0. 
n— 00 

Then lim, E[X,] = E[X]. 


Theorem 4.3 Suppose that Xn £ X with (Xn, X) € Q, where Q C IR. Then, for 
any real continuous and bounded function g (and in particular, the identity function), 


Jim Elg(X,)] = Elg (®©). 

A collection of technical results allows us to combine these different types of 
convergence and their mappings through continuous functions (mapping theorems), 
to study more complex models. For an in-depth look at this, as well as a generalization 
to multidimensional cases, we recommend the book [758]. 


5 For specialists: in the sense of the characteristic function (Lévy’s continuity theorem [207]). 
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4.2.2 Classical Statistical Estimation 


Inference refers to the process of constructing one or several estimators of y, 


A 


Wn = T(X), ss Aa) 


where T is a function of random variables associated with samples x; obtained from 
the phenomenon under consideration. As mentioned earlier, y, is itself therefore a 
random variable, whose observed value T (x,,..., Xn) is called an estimate. 


4.2.2.1 Important Properties of Statistical Estimators 
There are an infinite number of possible estimators for a given parameter vector y, 


so it is important to be able to select one in particular. In classical statistics, the 
following principles are used to order desirable estimators: 


1. asymptotically, there must be consistency, i.e., 
~ 9 
Wn > X, 


where ? represents, at best, almost sure convergence; 
2. the quadratic error 


LSE) = B[ in — V)" Gin 1], (4.5) 


also known as the least square error (LSE) or L? error, should be as small as 
possible. It can be written as the sum of the squares of the bias® (or non-asymptotic 
expected error) 


Bn) = B[ vn] - v, 


and the determinant of the variance-covariance matrix of Wns which measures 
the non-asymptotic inaccuracy of Wn- These two terms cannot be minimized 
simultaneously, and the minimization of (4.5) therefore necessarily proceeds from 
a bias—variance equilibrium. 


We also remark that producing a weakly consistent estimator Wn for an unknown 
parameter y then allows us to do the same for any function h(y) of that parameter, 
as long as it is differentiable. When the estimator converges asymptotically to a 
Gaussian distribution, the differentiation procedure is called the delta method. 


Theorem 4.4 (Multivariate delta method [570]) Let Y, ..., Yn be a stochastic pro- 
cess in IR? and let g : IR — R1 be a differentiable, non-null function in Y. Denote 
by J,(y) the Jacobian matrix of g in Y. Assume that ./n(Wn — Y) converges in 


6 An estimator Vn whose expected value is y is called unbiased. 
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distribution toward the multivariate normal distribution Na(Og, X), with mean the 
null d—dimensional vector 04 and covariance matrix © € IR¢?*¢. Then 


Vii (gn) — 8) > Ny (0, Jg NEJ (W). 


4.2.2.2 Least Squares Estimation 


The class of least squares estimators (LSE), which tries to find a compromise between 
bias and variance, is thus naturally defined by rules of the type 


Ùn = argmin LSE(Ÿ), (4.6) 
w 


where LSE is an empirical approximation of LSE, constructed as a function of 
X1,..., Xn. Estimates of this type often have good consistency properties, but may 
suffer from sensitivity to model choice and parameterization of y. Therefore, it is 
not always clear that the expectation and/or variance in criterion (4.6) exist(s). 


4.2.2.3 Maximum Likelihood Estimation 


The likelihood principle. A more general rule must necessarily be based on a more 
complete characterization of the information contained by X1,..., X, about the 
model’s parameter y, and needs to be generic and always well-defined. One way to 
proceed is to work with the statistical likelihood £, which is defined (for known X;) 
as the joint density of the observations X; = x; conditional on y. Thus, in the case 
where observations x; are iid samples, 


den.) =] [ fe Gil). 


i=1 


The likelihood can have a more complicated structure when samples are not indepen- 
dent or identically distributed, or have missing values replaced by thresholds when 
a measuring device’s limit has been reached. See, for instance, [804] for details on 
building a likelihood of correlated extreme observations. 


Example 4.8 (Measuring wind speed) 


Some old anemometers cannot measure the wind speed above a certain value, 
and replace the observation that should have been made by a maximum measur- 
able wind speed. This is referred to as a right-censored statistical observation. 
This type of partial observation is common in survival analysis [500]. 
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The completeness of the information contained in the likelihood is a fundamental 


principle of classical statistical theory. In fact, the maximum likelihood estimator 
(MLE)’ 


Wn = arg max €(X1,..., Xl) (4.7) 


defines the random variable with the most likely value of yw given the observations 
{X; = x;};. Due to its quite general form, its significance, as well as its stellar consis- 
tency properties, it is the most common and indeed most natural statistical estimator 
we have. 


Example 4.9 (Interval-censored data) 


Very frequently, a historical unidimensional datum (see page xxx), or an incor- 
rectly measured datum, can simply be described as a missing value x; located 
between two known bounds X; min < X;,max- The density term f (x;) must then 
be replaced in the statistical likelihood (4.7) by 


P Os < X < Lima r mine 4.) (4.8) 
=}? (x < eae aan) = IP (x < n) ; 
=F (Xi max) — F(Xi min): (4.9) 


Moreover, if we make the assumption that the values (X; min, Xi,max) are them- 
selves random data (e.g., noisy data), described as realizations of variables 
(Xi min, Xi max) With respective densities fi min, fi,max), the corresponding like- 
lihood term (4.9) becomes 


If P (Cat = X = Xi max| Xi, min = Xi max = y2) Fimin (1) fi,max (2) dy\dy2. 


When the datum is multivariate (Chap.9), several situations can occur: one 
or several dimensions of X may be interval-censored, and extensive process- 
ing must be carried out to obtain useful statistical specifications (see [438] 
for survival analyses, and [676] for the specific case of multivariate extreme 
values). 


Theorem 4.5 (Central limit theorem for the MLE) Suppose that X\,..., Xn are 
independent and identically distributed. Let q be the dimension of Y. Then, under 
quite general regularity (Wald) conditions, 


Yn > Ny (v, g) , (4.10) 


7 For practical reasons, we often replace £ by the log-likelihood, log £, in definition (4.7). 
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where N, is the q-dimensional multivariate normal distribution with variance- 
covariance matrix Iy n where Iy is the Fisher information matrix, whose (i, j) € 


{1,...,q} entry is given by (under the same regularity conditions) 
ij Ə log £(X1,..., Xn 
eee em ae Ea Ww) | (4.11) 
dpið pj 


More details about the notion of information and the Fisher information matrix are 
given in Sect. 4.2.2.4. One important property of the MLE is that it is asymptotically 
efficient, i.e., its asymptotic covariance, given by the inverse of the Fisher matrix, is 
minimal over all unbiased estimators of y. 


4.2.2.4 Fisher Information 


The notion of information was proposed in the 1920s by the English researcher 
Ronald A. Fisher. Fisher’s approach is as follows: if we are interested in the charac- 
teristics of a large population, we can neither know nor deal with overly abundant 
information relating to each and every individual. The problem therefore becomes 
to be able to correctly describe the population by means of summaries that can be 
constructed from samples from the population to be studied. The more the summary 
that can be extracted from a sample correctly characterizes the reference population, 
the more informative it can be considered to be. 

Based on this assumption, Fisher defined information as the mean square of the 
derivative of the log of the probability distribution being studied. Cramer’s inequality 
then makes it possible to show that the value of this information is proportional to 
low variability—that is to say, a high degree of certainty—in conclusions that can be 
drawn from it. This idea, which is at the root of all statistical estimation and inference 
theory, is exactly the same one discovered 20 years later by Shannon, expressed this 
time in terms of probability rather than statistics. If X is a sample from a probability 
density f(x|y), the Fisher information is defined by 


Op p| (ew) 
= w) ~a ‘ 


In the case where the probability distribution depends on several parameters, y 
becomes a vector. Then, the Fisher information is no longer defined as a scalar but 
as a covariance matrix, called the Fisher information matrix: 


1 = p| (7242) (mee) 
Vi = 
l dpi OW; 
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4.2.2.5 Confidence Intervals 


In practice, the asymptotic distribution of Wn is itself estimated by replacing the 
unknown term Jy by a consistent estimator J, (in general, J, = Ty ), which allows 


us to define confidence regions Cj, „ associated with an estimator Wn such that as n 
tends to infinity, 


P (or € Cha) = (4.12) 


In particular, when we are especially interested in a certain coordinate y;, Theorem 
4.5 helps us define the (asymptotic) confidence interval of level 1 — a associated 


with y,: 
P (v € [bn — Za/2 o; ` Uni F Zaæ/24/ 71) =1- Qa, (4.13) 


where Zg is the quantile of order a of the standardized normal distribution, and ofi 


the ith entry of the diagonal of the estimate of the inverse, Le 

Equations (4.12) and (4.13) help us to evaluate the precision of the estimate of y 
using the sample x1, . . . , Xn. However, note that the probability measure P in equation 
(4.12) concerns Wn and not y (which is unknown but fixed). The confidence region is 
not therefore defined by the probability 1 — «œ that y stands inside, but as the region 
in which there is a priori a strong probability 1 — œ of obtaining an estimate of Y. By 
simulating a large number of samples similar to x1, . . . , Xn, the associated estimated 
confidence regions should contain the true value of w on average 100(1 — w)% of 
the time. The confidence interval for the ith coordinate of y therefore aims to contain 
the true value y; with a certain probability linked to the asymptotic distribution of 
the estimator Wns which presupposes that the statistical model is correct (and not in 
terms of an absolute probability that is independent of the model). 

As is the case for the LSE, the MLE does not always have a closed-form expression 
(particularly for models used in extreme value theory), and often must be calculated 
using numerical methods. The LSE and MLE may not be unique for complex models, 
and the MLE may not even be defined (leading to an infinite likelihood). This is 
nevertheless rare in extreme value theory situations. Unlike the LSE, the MLE is 
always invariant under reparametrization, i.e., the MLE of h(w) is h (Wn), provided 
h is bijective. This property is crucial for avoiding paradoxes and inconsistencies. 
Indeed, if we replace the observation x with a bijective mapping y = d(x), the 
model parametrized by y is replaced by one parametrized by a bijective mapping 
w' = h(w). However, the information delivered by x and y is the same, so any 
estimation rule x > w(x) should be such that y = d(x) > w'(y) = hw (x)). 

One final argument pleads in favor of the MLE: the maximized likelihood con- 
stitutes the fundamental ingredient of most model selection methods. Equipped with 
a penalty term linked to the number of degrees of freedom of a model [26, 694], 
the estimate of £(x1,..., Xn| Wn) provides a useful diagnostic tool for evaluating the 
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pertinence of one model with respect to another, for a given data set. We suggest that 
readers wanting to learn more about this subject consult the textbook [620]. 

The vast majority of the tools and methods presented in this book come under 
the classical statistical framework, where the two estimators we have introduced are 
frequently used and compared. This choice is both a matter of theoretical properties 
and of a certain empirical robustness, the knowledge of which comes from practice. 
However, the classical statistical framework is an approximation to the larger, so- 
called Bayesian decision-making framework, which will be introduced in Chap. 11. 
In that framework, y is considered not fixed but random, which makes it possible to 
introduce, in addition to the statistical likelihood, a priori information, depending on 
the type of model and its use, and possibly expert knowledge about the phenomenon 
being examined. 


4.3 Statistical Regression 


We conclude this chapter by briefly recalling the main concepts of parametric and 
nonparametric regression, which will prove useful in some parts of the following 
chapters. The interested reader, confronted with the vast literature on this subject, 
may refer to Sen and Srivastava’s recent book [696]. The objective of a stochastic 
regression model is to determine how the expectation of a random variable Y depends 
on the realizations x of a vector X € x of explanatory covariates, possibly also 
random, through the regression (or link) function g: 


E[Y|X = x] = g(x). 


4.3.1 Parametric Approach 


A parametric approach consists in placing on f a form a priori parameterized by a 
finite number of parameters to be estimated; linear regression presupposes as well 
as 


g(x) = Bo + B'x. (4.14) 


The vector (Bo, B) can be estimated using different approaches. In this framework, 
maximum likelihood estimation is similar to a weighted least squares approach, 
provided the whole model can be written as 


Y = B+B'X+e 
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where € is a centered noise (with null expectancy), uncorrelated, and with a unique 
variance o?, and assuming that independent observations of Y and related values of 
X are available. 

Depending on the dimension of X, Y, hypotheses on the correlation structure 
between covariates and/or between the dimensions of Y, and finally according to 
the number of available data, the parameter estimation of a parametric regression 
model (linear or not) constitutes a class of fundamental problems in statistics. The 
primary objective of the many estimation methods, which are generally based on a 
maximization of likelihood penalized by a smoothing term (e.g., Lasso regression 
and ridge), is to ensure a satisfactory comparison between interpolation power (low 
bias and possibly high variance) and smoothing (low variance but possibly high bias). 


4.3.2 Nonparametric Approach 


A nonparametric approach does not assume, e contrario, any analytical form as 
(4.14) or, necessarily, a limited number of parameters defining function g. Two main 
families of nonparametric regression methods are commonly used. Suppose we have 
a sample {x;, y;}; of realizations of(X, Y). 


1. Kernel regression [543, 777] estimates g by a kernel estimator defined as 
nlx) = Do o, h)yi 
i=l 


with 


where K (u) = K ((x — i — x)/h) defines a kernel function—namely a probabil- 
ity density function with maximal value in 0 and depending only of the distance 
|x; — x| (therefore a symmetric function)—and h a smoothing parameter named 
bandwidth. Each weight œ;(x, h) decreases as the distance |x; — x| increases, 
and increases when h becomes high. The selection of h results from both a bias— 
variance trade-off and an arbitration on the regularity of the curve x > (x) 
(smooth versus irregular), which is very generally produced by approximating a 
criterion C(h) defined as the Mean Integrated Squared Error (MISE) 


Ch) =E p on a) ax]: 
X 


58 


N. Bousquet 


This criterion being not directly usable (g being unknown), it is typically replaced 
by a cross-validation criterion between the y; and each predictor g_;(x;, h) builds 
by using the whole sample from which the point (x;, y;) is removed. 


. Local polynomial regression is an alternative approach that reveals to be more 


robust to extreme values of X for estimating g. It postulates that the value g(x) can 
be approximated by the value of the parametric function estimated in a neighbor- 
hood VY (x) of x (locally estimated). It may be relevant, for instance, to approximate 
g(x) with a polynomial of degree 1 


&(x) = à (x) + B(x)" x 


where the estimators &(x) and B(x) are determined by minimizing the weighted 
criterion 


D a(n- awi) 


x EV(x) 


and where the weight w; may be chosen constant (LOESS regression [168]) or 
be defined, again, through a kernel function (LOWESS regression [169]). 


Chapter 5 A) 
Collecting and Analyzing Data get 


Marc Andreewsky and Nicolas Bousquet 


Abstract This chapter focuses on the first steps of a statistical study of extreme 
values based on real hazard data. Illustrated by numerous examples, the qualitative 
and quantitative characteristics of these data are analyzed in order to come as close 
as possible to the theoretical conditions of application—in particular the notion of 
asymptotism—and to statistically characterize their nature (regular, truncated, cen- 
sored...). This chapter is, therefore, an essential prerequisite for the concrete appli- 
cation of the theoretical tools presented in the following chapters. 


5.1 Introduction 


Extreme values are, by definition, rare and difficult to observe. Nevertheless, the more 
observations available, the clearer their statistical behavior becomes. The width of 
confidence intervals for statistical estimators is inversely proportional to the square 
root of the number of observations (Sect. 4.2.2.3). For this reason, data collection is 
a crucial and sensitive part of the study of extreme values. It is also very important to 
include the results of any study carried out in the past on the same problem, in order 
to compare them. The importance of this step should be particularly emphasized 
because, as mentioned earlier, validation is tricky, so any extra information that 
facilitates decision-making can be extremely useful. 

In order to make it possible to apply theorems mentioned in Chap. 4, the most 
useful data has two forms: 


e A sample of the phenomenon for a given time period (Block Maxima approach)— 
typically one data per month or year; 

e A sample of events exceeding a threshold over a given time period (Peaks Over 
Threshold approach). This approach necessarily requires that we be able to directly 
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estimate the particular threshold above (or below) which we consider ourselves to 
be dealing with an extreme event; or alternatively, that we be able to estimate the 
duration of the phenomenon, i.e., a temporal threshold below (or above) which the 
extreme phenomenon can be considered ended. 


Understanding the nature of these data, collecting them, sorting them according 
to their quality and representativeness, filling in the gaps, and ensuring a sufficient 
amount of information for conducting an extreme statistics study are all the issues 
addressed in this chapter. 


5.2 Looking for Available Data 


5.2.1 Building Databases 


It is essential to construct the largest possible database of observations or estimates 
of the study variable, whether these be continuous, discrete, measured, historical, 
paleo-historical, simulated, etc. For each discipline, the sources and methods used 
to collect data may be very different. Here are some examples. 


Example 5.1 (Maritime surges (1)) 


Numerous sea (and ocean) level measurements on the French coast are pro- 
vided by the Marine Hydrographic and Oceanographic Service* (SHOM) and 
can be downloaded from http://www.shom.fr. The investigation of historical 
extreme maritime tides has been the subject of the recent works [121, 350]. 


Example 5.2 (Maritime surges (2)) 


Observations of sea level are generally available at hourly time-points [40]. 
Modeling of instantaneous maxima (i.e., peak sea level reached during high 
tides) cannot rely on this directly observed data, since the maximum level is 
unlikely to be produced at the exact time of measurement. This modeling can 
instead be developed using polynomial interpolation to reconstruct the signal 
between pairs of sea level observations located to either side of the supposed 
maxima, one hour apart. The diagram in Fig. 5.1 illustrates this calculation. 
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Fig. 5.1 Illustration of the interpolation allowing the calculation of the tide peak level 


Example 5.3 (Extreme tides) 


If the study variable is the extreme tide, it is necessary to subtract from the 
sea level calculated above, the theoretical extreme tide level [40], which can 
be retro-predicted; indeed, the theoretical tides are governed by the gravita- 
tional interaction between the earth and other large objects in the solar system, 
principally the sun and moon [636] (p. 249). 


Example 5.4 (Reconstructed hydrological data) 


The construction of a representative dataset (hydrological flow series) can be 
carried out with the help of measurements from a neighboring site [518]. A flow 
naturalization procedure can be used to account for the influence of hydraulic 
structures on flow (see Example 3.2). 
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Example 5.5 (Sticky snow load data) 


Electrical cables can be subjected to a heavy load, potentially exceeding a 
safe-load criterion, due to accretion of sticky snow [214]. Load data, useful 
for this type of study, can be constructed using meteorological observations 
(see Example 3.3). 


Missing data can be particularly detrimental to extreme value statistics studies. 
Certain measuring devices, usually old and somewhat fragile, can break down just 
when a particularly extreme event occurs. To give an example, tide gauges installed 
before 1990 frequently break down during storms [197]. Torrential rain can also 
be significantly underestimated (by more than 10%) depending on the type of rain 
gauge used [681], mainly due to evaporation. Observations of the same hazard at 
neighboring sites, or combined hazards, may indicate that an extreme data value was 
missed. This reaching of the limits of a measurement system characterizes what we 
call censored data (see Example 4.8). 


Example 5.6 (Meteorological data) 


It is common practice to replace missing values of a meteorological variable 
(temperature, wind speed, etc.), if limited in number, by the variable’s mean 
(or by linear interpolation), while being cautious not to modify the effective 
time of observation. This, provided that the missing values do not correspond 
to the known occurrence of an extreme phenomenon (e.g., a missing wind 
speed measurement during a storm). In such cases, it is better to try to estimate 
the missing value from neighboring measurements [703]. 


Biases, and more generally measurement errors, are also important factors that can 
undermine the results of statistical studies. Ideally, each device should be calibrated 
before use, and each measurement accompanied by a description of the potential 
measurement error. 

This is all more important since extreme value statistical studies are likely to have 
to resort to historical data (see the Preface and [72, 692]), to compensate for small 
sample sizes. Historical data is subject to measurement or reconstruction errors! 
which depend on the time period in question.” 

Such data can generally be presented as: 


' See [553, 603] to highlight, in hydrology, the systematic biases and random errors that appear 
when reconstructing historical data. 

? For instance, satellite temperature surveys began in the late 1970s only, and sea level rise has only 
been monitored by satellite since 1992 [607]. 
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e punctual and scattered values associated with the intensity of an extreme event 
[707]; 

e a sample measured by obsolete and possibly very imprecise processes [638]; 

e a threshold of perception over a historical period: between two major events 
observed in the past, it is assumed that the absence of data indicates that no events of 
at least the same intensity as the lower of the two “surrounding” events occurred 
(otherwise it would have been indicated, unless there was a known stop in the 
measurements, for example during a conflict). 


The latter hypothesis allows historical data to be statistically treated as censoring by 
threshold data [588]. Alternatively, the first situation underlies the hypothesis that 
only a limited number of the most intense events of the past have been preserved in 
societal memory (censoring by number [707]). The statistical translation of these data 
has been particularly examined by [588, 638] in a context where only one hazard is 
considered, and then by [676] in cases where several extreme hazards are cumulated. 
A generic statistical specification of an interval-censoring situation is proposed in 
Example 4.9. 

Since Le Roy Ladurie’s pioneering work on climate history and phenology* [460], 
the study of a wide variety of historical sources (see Preface for details) has con- 
sistently revealed interesting qualitative and quantitative information for building 
historical datasets [172, 519]. Thus, the use of this type of data is very valuable, 
if not almost indispensable, for dealing with phenomena such as tsunamis [324] or 
high sea levels [72, 128]. We refer the reader interested in these methods of recon- 
struction to the works of specialized historians? [308, 311], who often deplore the 
rapid disappearance of the memory of disasters [307]. Surveys conducted after the 
2010 Xynthia storm have until recently shown that historical lessons have often been 
overlooked [103, 319], despite a relatively close history. 


Example 5.7 (Reconstructed historical floods) 


The reconstruction of extreme historical floods must be the subject of meticu- 
lous historical work [581]. Payrastre and his co-authors [603] propose a list of 
desirable characteristics of these historical floods. A recent pedagogical exam- 
ple of such a reconstruction work, concerning the past floods of the Gardon 
(a river located in the French Languedoc-Roussillon-Midi-Pyrénées region), 
is carried out in [217]. 


Finally, potential problems due to combinations of hazards have to be detected. 
Do we have conjoint data? (e.g., swell and extreme tides), and if not, is it possible to 


3 The funding of scientific projects in the environmental sciences, including many historians, testify 
to the growing importance given to these data. See, for example, the OPHELIE project (PHenological 
Observations to Rebuild Europe’s Climate), coordinated by the Pierre-Simon Laplace Institute: 
http://www.unicaen.fr/histclime/ophelie.php. 
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reconstruct certain conjunction scenarios or define conjoint data as a combination of 
“true measurements” and a probable interval in which a missing measurement value 
is likely to be? 


5.2.2 Analyzing the Quantity of Available Data 


Methods for checking that the length of an available data series is appropriate for a 
given study are not generic, and remain the subject of debate [174, 484, 645]. Each 
situation requires expert knowledge and input. Limit theorems underlying the use of 
extreme value statistical distributions are valid asymptotically. Demonstrating that 
we have attained this asymptotic validity “in practice” is no easy task. The available 
observed data can be used to guide this decision. In general, the length of the data 
series varies between 20 and 60 years. For example, on average, French extreme tide 
data from tide gauges is available from a 30-year period [780]. 

The consensus is that a sample of less than 20 maxima values is too small to 
estimate the value of a one-in-a-thousand event using GEV modeling. In general, it 
is good practice to systematically test whether the fit of a GEV distribution with a 
shape parameter of zero (Gumbel or exponential distribution) is not just as good a 
fit for the sample as the full GEV distribution. This comparison can be done using 
a likelihood ratio test, as described in Theorem 4.1. However, the test is only valid 
asymptotically, which contradicts the situation we find ourselves in. The estimation 
of the shape parameter € in the GEV distribution can be so uncertain that it seems 
more reasonable, in practice, to set it to zero. 

While the POT approach can take advantage of more data than the MAXB 
approach, researchers working with this extreme value method still generally avoid 
samples that are too small. The fit to a GPD distribution is often not considered 
sufficiently reliable if the sample has fewer than 50 independent values. 


Remark 5.1 The independence of annual maxima must be scrupulously checked 
by going back to the original data. An event may occur at the very end of a year and 
continue at the beginning of the next. 


Some experts even advise not to estimate quantiles associated with return periods 
that exceed 100 or 200 years for a local analysis [727], i.e., for data recording periods 
generally below 50 years. This criterion is often too strict since a study of extreme 
values must necessarily proceed in a framework of extrapolation of observed data 
collected at very low probability levels (Sect. 3.1.1.5). Other experts believe that it 
is reasonable to estimate quantiles associated with return periods of up to five times 
the observation period [627]. Thus, some authors have proposed estimates of values 
associated with return periods of up to 1 000 years (sometimes even 10 000 years 
[803]). While some researchers point out that such extreme values may have little or 
no scientific [107, 173] or practical (e.g., [219] in a financial context) basis, others 
have accepted the use of these probability distributions as consensus tools to define 
particularly high dimensioning values [221]. 
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While it has long seemed that there was no absolute rule in this area, recent work 
has made it possible to frame theoretically the validity of an extrapolation [29]. This 
validity remains in all cases conditional on the validity of the data production model, 
and for this reason, the “reasonableness” of an extrapolation must be constantly 
questioned. Clearly, it is not advisable to extrapolate a ten-year probability from ten 
years of data. 


Remark 5.2 In the case of joint hazards, the amount of data and the information 
it contains must also be sufficient to allow estimation of the correlation structure 
between the hazards, and in particular, the correlation between extreme events. More 
details are provided in Chap. 9. 


When we wish to estimate a return level, i.e., a quantile of the statistical distribution 
being used, it is possible to guard against high estimation uncertainty due to small 
sample sizes by calculating an upper bound to the quantile. For example, the ASN* 
guide [15] to the protection of standard nuclear installations from external flooding 
proposes the following approach: 

After gathering available information thoroughly enough to ensure we have found 
all that can be found, it is necessary that the entire available statistical sample be 
used to determine the probability distribution of threshold-exceeding rare events. In 
addition to regularly recorded data from observation stations, any historical data 
from earlier times should be taken into account. Given the fact that these sources 
of data are limited, concerning generally only relatively short periods of time, it is 
essential to construct a confidence interval for the calculated mean value. In order to 
cover for uncertainty associated with such data samples, the extrapolated value used 
is the upper bound of a confidence interval associated with each flood risk reference 
scenario. In practice, using the 70% confidence interval for this task is generally 
considered “appropriate” (translated from [15]). 


Remark 5.3 For a study conducted with the POT approach, the threshold is usually 
chosen so that the number of observations per year is around 2-10 [40]. Any value 
lower than this will have to be justified by an expert, or other sampling methods 
used instead, such as regionalization (Chap. 7), allowing, under certain conditions, 
more data to be collected for the desired analyses. See also Sect.5.4.2 for formal 
threshold-selection approaches. 


5.3 Checking for Data Quality 


The quality of measurements must be examined in detail. In particular, noise in 
data must be characterized as it may influence statistical quantification (Sect. 4.1.6). 
Measurement errors need to be known in order to determine, among other things, the 
number of significant digits to keep in calculations. In some cases, the distribution of 
the measurement noise can be estimated by studying the phenomenon under normal 
conditions, either as indicated by the supplier, or via the work of experts (see for 
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example [8] for the quantification of uncertainty and measurement errors for tide 
gauges, and [254] on the thickness of volcanic ash). It is possible, in such cases, to 
then appropriately take into account noise sources (see for example [62]). 

In order to analyze the data quality, it is also necessary to verify its origin. Which 
organization did the measuring? Are they reliable? Were pre-treatment or quality 
analyses conducted? Verifying the quality of the measuring device’s calibration is 
equally important: what method was used to calibrate it? 


5.3.1 Relevance of Extreme Observations 


Particular care must be taken when checking the larger values, because these will 
greatly influence the statistical fitting of the GEV or GPD distribution and the char- 
acterization of return levels. Such verification generally requires the input of several 
scientific disciplines. 


Example 5.8 (Extreme tides) 


It is standard practice to check that the ten (at least) largest extreme tides 
recordings can be linked with strong depressions (and hence severe storms) in 
the area where they were taken, using databases like those available at http:// 
www.meteociel.fr. 


In the event that no extreme event was detected in an area and on a date where a 
particularly large data value was recorded, it is permitted to remove the data point, 
specifically stating why it was removed. A major difficulty is that many methods for 
detecting such outliers are themselves based on the use of extreme values [132, 697], 
although solutions have recently been proposed to distinguish these aberrant values 
from truly “extreme” ones [300] and applied to the case of extreme rainfall. If, after 
analysis, there is doubt, it is better to simply keep the data. Data removal should not 
occur without particularly rigorous and transparent reasoning and justification. 


Example 5.9 (Extreme low tides) 


If we want to verify the validity of a sequence of values of extreme low tides 
(undertides), it may be useful to conduct an analysis of the associated mete- 
orological conditions. In this example, recordings are from the Saint-Servan 
tide gauge (in the French county of Ille-et-Vilaine): the undertide observed on 
22/11/1906 was large (96 cm) and among the widest undertides ever recorded at 
Saint-Servan. There is no information available that would allow us to remove 
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Fig. 5.2 Isobars of pressure at sea level (white lines) in hPa, with French annotations (Source http:// 
www.meteociel.fr) 


it. Indeed, the reconstruction of the atmospheric pressure and wind at the time,* 
certainly impacted by uncertainty in view of how long ago the measurements 
were taken, reveals the presence of a deep depression over the North Atlantic, 
with an anticyclone centered on France, with a high pressure of about 1030 
hPa near Saint-Servan (the average pressure at sea level is about 1013 hPa) and 
wind coming from the south-west. Figure 5.2 shows the reconstructed pres- 
sure fields for 22/11/1906. This situation is a priori favorable in leading to a 
rather low sea level. Moreover, in the Saint-Servan low tide record, high excess 
(drops) levels (mostly greater than 30 cm) were observed between 09/11/1906 
and 28/11/1906, i.e., in the temporal range of the 22/11/1906 event. 

This situation corresponds rather well to a persistence of high pressure around 
22/11/1906. These various points are in favor of maintaining the measurement 
of 22/11/1906 in the Saint-Servan time series. Nevertheless, the observed high 
pressure values are not really out of the ordinary; it is quite likely that the value 
of 96 cm, which is relatively large, was overestimated. As there is no evidence 
to prove this, the value for 22/11/1906 must be retained in the final sample 
used for the statistical study. 


4The reconstruction can be found at http://www.meteociel.fr/modeles/archives/archives. 
php?mode=0&month=1 1 &day=22&year=1906&map=0&hour=12 
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Other ways of checking the validity of data can be found in various disciplines 


e Hydrology. It is advisable to check the coherence of flood volumes at one sta- 
tion with the measured volumes at nearby stations. It is also useful to compare 
sequences of daily maximal flows with pointwise maximal flows, and to study the 
quality and accuracy of calibration curves (see Example 4.7) to evaluate uncer- 
tainty in available flow measurements [14]. 

e Maritime hydraulics. In addition to what has been suggested in the previous exam- 
ples, it is often useful to compare one site’s tide level series with that from a 
neighboring tide gauge.” Furthermore, aberrant and isolated sea level values can 
be detected graphically when measurements are taken with a time step of the order 
of a few minutes (from 1—10 mins) [14]. Also, an analysis of meteorological con- 
ditions observed in the days preceding a suspicious measurement often makes it 
possible to determine whether a particularly extreme high or low tide could have 
been possible. 


5.3.2 Detection of Changepoints and Outliers 


Another characteristic of measuring systems must not be forgotten: they evolve with 
time. The quality of measurements when passing from one system to another must 
be examined closely. In this way, it becomes possible to map the most biased mea- 
surements onto the most reliable ones if, for example, we notice a constant bias 
between the two systems (see, for example, [465] for an illustration on old tide level 
measurements). More generally, statistical tests can be used to detect the presence 
of changepoints, unexplained trends, and aberrant values, due to poor-quality or 
erroneous measurements in the observed data time series. For example: 


e Alexandersson’s test was developed to detect changes in rainfall time series; it 
can be applied to sequences of ratios comparing observations from one measuring 
station to the average of several [32, 585]; 

e The cumulative residuals approach is a frequently-used method in hydrology for 
sequences of flows which consists of analyzing the cumulative difference between 
nearby stations so as to check the quality of a given sequence [518, 585, 640]. 


Possible trends can also be linked to the non-stationarity of phenomena. Tools for 
testing this can be found in Sect.5.4.3. Finally, it is possible to find in [266] an 
analysis method to discriminate a measurement noise from a real extreme value of a 
climatic phenomenon (applied to temperatures). 


5 Information on sea level measurement quality control is provided by SHOM at http://www.sonel. 
org/_Controle-qualite-des-mesures.html. 
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5.3.3 Processing Roundness and Measurement Noise 


Another important practical problem is raised by the roundness affecting the mea- 
surements. Since the statistical theory of extremes applies to continuous variables, 
roundness must be such that no data is repeated (because it is a zero probability event 
in the continuous frame). Another practical tip, indicated by [766] (for different sta- 
tistical distributions) and taken from Sect. 4.3 of [484], is to choose an accuracy such 
that the existence of classical statistical estimators of quantities of interest, as a return 
level, remains assured. More precisely, it is a question of verifying that the numerical 
estimation procedure remains applicable. The validity of the calculations performed 
with sufficiently precise roundness is guaranteed by theoretical results obtained by 
[431, 484] (Sect. 4.4) then [267]. 

The presence of numerical roundness can be interpreted as the effect of a centered 
noise erasing a measurement accuracy [265], which can be assumed to be incorpo- 
rated into a broader observational noise. In this sense, it is, therefore, essential to seek 
to “perturbate” the original data by a similar noise; the data thus produced remain 
extreme and constitute a legitimate sample that could have been observed. If the noise 
applied takes more precise numerical values than the roundness of the original data, 
the repetition mechanisms within the data vanish and it becomes possible to apply 
the theory of extreme values. This jittering approach, available in many software 
tools—usually to facilitate data representation [275]—has been successfully tested 
on many cases by the authors of this book. 

In all rigor, the statistical quantification must be based on the convolution mecha- 
nism described in Sect. 4.1.6 and derived from an equation such as (5.1). An empirical 
way to implement this principle is to apply a bootstrap procedure by reproducing 
the study for a large number of simulated noises and then averaging the estimation 
results. The choice of the noise to be injected (generally uniform or Gaussian), if 
the observation error is not well known, depends on the stability of these results. 
However, some study variables remain poorly adapted to the application of such a 
process. Thus, EDF studies have shown that the jittering of temperature measure- 
ments generally posed little difficulty, in the sense that the traditionally calculated 
return levels remain stable, while wind speed measurements are often insufficiently 
precise to allow an addition of noise that does not significantly modify these return 
levels. 

Note finally that the statistical quantification on the observed or jittered variable 


X*=X+e 
provides a conservative representation of real hazard X, in the sense that the estimated 
variability of this hazard is greater than that of the real hazard: under an independence 


assumption 


V[X*] = VIX] + Viel. 
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However, this does not automatically mean that the extreme return levels calculated 
assuming that X* follows an extreme value distribution are always higher (and there- 
fore, conservative) than the actual return levels on X*. It is, therefore, essential to 
ensure the robustness of the jittered estimates when € is simulated. 


5.3.4 The Case for Simulated Data 


Certain situations lead to the use of simulated data. For example, [780] uses the 
ANEMOC2* database [744] to obtain data on highly significant wave heights. The 
simulated data used needs to have been checked to make sure it also covers extreme 
events (it may only deal with data values close to the mean). 


5.4 Statistical Characterization 


Independence and stationarity (in frequency, intensity, and length) of observations are 
fundamental hypotheses underlying the use and fitting of extreme value distributions. 
We look closer at these in the following section using statistical tests (Sect. 4.1.3) 
such as those found in Excel, R, etc., in function of the chosen sampling scheme 
(MAXB or POT). For POT, a satisfactory (and necessary) choice of a threshold can 
be tested using a specific procedure. 


5.4.1 Testing Independence Between Data 


The hypothesis of independence of exceedance is crucial for the POT approach: the 
maxima have to be independent. Maxima found next to (or within) a major peak need 
to be removed. If we are working with annual maxima (MAXB approach), we can 
generally suppose that data are independent. However, in every case, we need to test 
this. The value of the autocorrelation coefficient, and testing the Poisson hypothesis 
(for the POT approach), allow us to accept (or indeed reject) the hypothesis. 


5.4.1.1 Autocorrelation Indicators and Tests 


The linear correlation coefficient (Pearson’s) measures the quality of a linear rela- 
tionship between two variables [207]. One can simply plot one variable in terms of 
the other (for instance, a variable at time f in terms of the same variable at time t — 1) 
to get an idea of whether they are linearly related. A standard method for testing for 
independence uses—as hinted above—autocorrelation. 
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Table 5.1 Selected nonparametric tests of independence or correlation of univariate sample ele- 
ments. Nonasymptotic distributions are provided between parentheses on purpose 


Test Spearman- Modified Durbin-Watson | C-statistic 
Student Mann-Kendall (DW) 

Null hypoth. Data Data Uncorrelated Uncorrelated data 
independence independence linear 

Ho Independence Independence Model residuals 

Alternative Increasing or Increasing or Order 1 Order 1 

hypoth. correlated 

Hi Decreasing trend | Decreasing trend | Residuals Correlated data 

Minimal sample | 10>n>5 10 > n > 5 Kena | n > 8 n > 8 

size (Spearman) 

(in practice) 30>n> 11 


(Student) 


Remarks Underlying linear | More robust than 
model hypothesis | DW 

References [339] [236, 685] [43, 208, 749, 
793] 


In statistics, the autocorrelation of a discrete time series is simply the correlation 
between the process and an offset version of itself. For real-valued time series, the 
empirical autocorrelation function (ACF) is defined by 


n—h 
1 _ - 
ACF, (h) = = ) (i — Din — 7), 


* j=l 


where 


1~ 
= oe: 
oe (x; x); 


i=1 


and h is the time offset or lag at which the autocorrelation is calculated. For: variables, 
the autocorrelation function is given by 


lifh=0 
ACF, (h) = | 
Cen) ieee <n. 
Several nonparametric statistical tests, involving estimators of the autocorrelation 
function (or some related function), are selected on Table 5.1 for their relatively good 
behavior when applied to sample of small to moderate size. They can appropriately 
accompany the tests described hereafter, more specific to the extreme value context. 
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Example 5.10 (Successive exceedance peaks) 


To test independence of n successive exceedance peaks, we can calculate the 
autocorrelation coefficients between the selected events, then the associated 
Student’s variables. We then have to see whether these variables follow Stu- 
dent’s distribution with n — 2 degrees of freedom, with associated risk of error 
a. Other possible approaches are more domain-specific (see the following 
example). 


Example 5.11 (Extreme high tides) 


When looking at POT data for extreme high tides, it is acceptable to keep 
two high values close in time if and only if: (a) they are above the chosen 
threshold ug (above which we consider having seen an extreme event), and 
(b) if, between these two high values, there is another one equal to less than 
uo/2. This criterion, called redescent [433, 434], allows us to suppose that 
the two close-together high values come from different storms and are thus 
independent. 


An example of calculating the autocorrelation coefficient for a study of strong 
winds is given in Fig.5.3. The confidence interval for the hypothesis ACF(h) = 0 
can be found using the normal distribution with mean 0 and variance (n — 1)~!/ 2. 


5.4.1.2 Testing the Poisson Hypothesis (POT Approach) 


When modeling extreme values using thresholds (POT approach), it is necessary to 
test the Poisson hypothesis, i.e., that the distribution of the number of values per 
year above a threshold follows a Poisson distribution (Sect. 4.1.2.1). Testing this also 
means testing the absence of overly-strong dependency in the initial sample. The 
adequacy of the distribution of the number of observations per year retained in the 
POT sample with respect to the Poisson distribution can either be examined visually 
or using a classical statistical test, such as a x? test (Sect. 4.1.3). 


5.4.1.3 Detecting Dependency 


If the previous tests do indicate dependency in the data, this can be dealt with in 
theory using stochastic processes (Sect. 4.1.5). As long as the dependency is not too 
strong, extreme value theory can still be used. However, it must not be forgotten that 
dependency automatically decreases the quality of asymptotic approximations. 
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Fig. 5.3 Autocorrelation diagram for a wind strength time series. The dotted line is the confidence 
interval under the null hypothesis of autocorrelation 0. The time step is days; the small value 
calculated is thus compatible with the hypothesis 


Following the criteria formally defined by [463], two events x; and xj, “above 
a certain threshold”, for which the indices i and j are “sufficiently separated”, can 
be considered independent even if they are found in a dependent dataset (which 
still needs to be stationary). The choice of a MAXB approach requires choosing 
the block size (periods), so that tests of independence, applied to extreme values, 
give results that do not refute independence. The choice of a POT approach requires 
declustering* techniques, which will be described in Chap. 7. 

From a practical point of view, the concept of “sufficiently separated” data requires 
knowledge of the system’s physics that could allow the definition of a distance above 
which data can be considered independent. This choice strongly depends on the 
phenomenon in question. For example, a time period of three days was used to 
define two independent extreme high tides by [57], whereas for certain water flow 
studies in large water basins, time periods of over ten days are often used. 

In addition to calculating the autocorrelation coefficient, calculation of the 
extremal index provides an estimate of the distance at which two observations can 
be considered independent. This coefficient is used to overcome the fact that, in gen- 
eral, dependency in a time series can take different forms, and that it is impossible to 
develop a generalization of the behavior of extreme values without defining a precise 
practical framework. To define such a framework, it is necessary to introduce a mea- 
sure of the temporal dependency between data points which are above high-valued 
thresholds (see Definition 6.9 for a formal characterization). 
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Definition 5.1 Extremal index. The extremal index (or coefficient) 4, such that 
0 < @ < 1, is formally defined in Chap.9 (Definition 6.9). The closer the value of 
0 is to 0, the more the extreme values have a tendency to cluster (or aggregate, as 
groups of associated values above a fixed threshold). The index can be interpreted 
and estimated in several ways: 


1. as the inverse of the limit average size of cluster in the data series, the limit being 
defined by increasing the threshold; the original dataset of n dependent values 
brings as much information as an iid sample of size n0; 

2. as the limiting probability that a point over the threshold will be followed by a 
sequence of values below it; 

3. the proportion of null times between two threshold excesses. 


We will focus on the first interpretation in this chapter. Therefore, the extremal 
index can be estimated by 


where n, is the number of times a threshold u was exceeded, and ne the number of 
clusters obtained with this threshold. It, therefore, suffices to calibrate (u, nu, ne) so 
as to obtain a sufficiently large equivalent number of iid data n0. 


5.4.2 Peaks Occurrence Testing (POT Approach) 


The test for the occurrence of peaks works by comparing the empirical distribution 
of the occurrence of exceedance: to that of the theoretical Poisson (Sect. 4.1.2.1) 
and negative binomial distributions. Then, a x? test is run to compare the observed 
frequency of exceeding the threshold n times in a reference period to the probability 
p of exceeding the same threshold under a Poisson “(n) or negative binomial 
BN (n, p) distribution, whose mass function is 


T(n +k) 


pa- p. 
Experience shows that in most cases, and if the threshold is well-chosen, the Poisson 
distribution works well for this. 

This test, therefore, makes it possible to check whether the threshold has been 
chosen well and also provides a comparison with the representation of events pro- 
posed by the negative binomial distribution. If the Poisson fit is not satisfactory, it is 
advisable to start again and test with a new threshold. Another test, called the uniform 
distribution test, also makes it possible to judge the Poisson nature of the occurrence 
of maxima (see Sect. 5.4.3.3). 
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5.4.3 Checking Stationarity 


Stationarity of the process generating extreme valued data (Sect. 4.1.5 and Definition 
4.1) can be interpreted as meaning that the observed extreme values follow the same 
distribution and do not change their behavior and frequency over time. 


5.4.3.1 Reasons for Non-stationarity 


In general, potential non-stationarity can be detected either in the original full dataset 
or in the sample of extreme values. The absence of a predefined link between these 
two types of non-stationarity obliges us to check for both. 

The presence of cycles (the typical example in meteorology being the seasons), 
trends (extreme values tend to increase or decrease over time), and changepoints 
(the extreme phenomenon begins after a certain date) are all signs of non-stationarity 
[641]. Appropriate statistical tests are described in Sects. 5.4.3.2 and 5.4.3.3. 

The stationarity hypothesis can also be questioned if we go far back into the 
past and use historical data. The fact that the climate evolves over time makes this 
a legitimate concern [161, 517], but the apparent non-stationarity of a particular 
phenomenon cannot always be proven. For instance, no significant change in storm 
intensity in Europe has been observed in recent decades [272]. Another example 
from [641] has shown that most statistically significant trends detected in a large 
database of flows in France were due to measurement error or drift. Below is another 
example. 


Example 5.12 (Correcting excess high tide calculations) 


The open sea high tide exceedance* or residual is the difference (if positive) 
between the observed level of high tide and the predicted one in open sea, the 
time at which these two occur being possibly different (see Fig. 5.4). For this 
reason, the meaning of instantaneous excesses, calculated as the difference 
at the same time instant between measured and predicted levels, adds con- 
fusion (it can potentially be much larger, as shown in Fig. 5.4). Undertaking 
an extreme value statistics study on high tide exceedances measured at a port 
requires both tide level measurements and retro-predictions of the theoretical 
tide level there [40]. 

To this end, Weiss [780] studied sea level data from Newlyn (Cornwall, 
England) and showed that there was a trend linked to the sea level. Here, 
eustatic change* is calculated according to the rules of the Permanent Service 
for Mean Sea Level® (PSMSL) using time averages from tide gauges over 
periods of one day, one month, and one year. The calculations showed that the 
mean sea level at Newlynis rising at a rate of 1.8 mm/year (Fig. 5.5). This means 
that retro-predicted values need to be corrected for this eustatic change, so as 
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not to bias the calculation of high tide exceedances. The tide level measurement 
at a date T, therefore, needs to be modified by a value of (2010 — T) x a, 
where a = 1.8 mm/year. The corrected high tide exceedances are shown in 
Fig.5.6. We see that the trend has disappeared (it would still be there if no 
eustatic change correction had been made), so now the time series can be used 
to provide extreme data (whether independent or not) for a statistical study. 


© http://www.psmsl.org/data/supplying. 


Other issues related to stationarity should also be taken into account. For instance, 
historical data may not correspond to samples of a random phenomenon similar 
to that currently observed, for reasons other than climate. For example, historical 
floods of a river whose bed has evolved over time may lose all or part of their 
significance; see [233] for an analysis applied to the Isère river (France). It is possible, 
for some historical data that seem to be outliers from the rest of the sample, that we 
don’t necessarily find the reasons for the discrepancy, as illustrated in the following 
example. 


— theoretical tide 
=- observed tide 


2 Spm ` 


tide level 


T 
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 
time step 


Fig. 5.4 Illustration of the calculation of high tide exceedances, adapted from [40]. Spm indicates 
the high tide exceedance in open sea, while S; indicates the instantaneous high tide exceedance 
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— linear regression 
© mean yearly sea level 


sea level (mm) 
7050 7100 7150 


7000 


6950 
L 


i T T T 1 
1920 1940 1960 1980 2000 


Fig. 5.5 The evolution of the annual mean sea level at Newlyn (adapted from [780]) 


high tide excesses (with eustatic correction) (m 


| T T T T 1 
26-Apr-1915 28-Jan-1931 04-Oct-1946 10-Dec-1965 26-Dec-1981 02-Sep-1996 


Fig. 5.6 High tide exceedances calculated with high accuracy at Newlyn using a corrected retro- 
projection of sea levels. This plot was constructed using tide level data from SHOM and an EDF 
software retro-predicting tide levels 
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Fig. 5.7 The flow of the Rhine river at Kembs (Source EDF-DTG) 


Example 5.13 (Fluvial hazards) 


Changes affecting river basins and human usage of rivers and streams give rise 
to doubts as to the stationarity of flow distributions. Figure 5.7 illustrates the 
worries we should have about this: the sample showing the flow rate of the 
Rhine since 1999 seems stationary. But if we include historical floods of the 
nineteenth century, this appearance of stationarity disappears. 

Is it because of a changing climate? Have the Swiss strongly influenced the 
behavior of the Rhine due to civil engineering works [771]? Is a century of 
data sufficient to consider building a statistical model? Has the measurement 
system changed? If now observations made between 1995 and 2010 are taken 
into account, it is legitimate to ask whether, if we wait a little longer, the 
appearance of stationarity might return. It is difficult to answer such questions! 


5.4.3.2 Tests of Stationarity 


The stationarity hypothesis excludes the possibility of any large-period cyclical vari- 
ation and any systematic change in the phenomenon over time. Some important 
nonparametric tests for detecting trends or changepoints are presented below. Other 
tests, known as parametric, make it possible to compare models showing or not 


5 Collecting and Analyzing Data 79 


showing trends (e.g., likelihood ratio tests). These will be presented later in this 
book. 


Testing means and variances (Fisher-Student). The data sample Xn = (x1, ..., Xn) 
is divided into two sub-samples Xn, et Xn, (of separate measurement periods). The 
means u and variances o? associated with these two sub-samples are estimated by 
the classical empirical estimators (Xn, Xn.) et (Sn,, Sn). First, Hypothesis Hoq) of 
variance equality is tested (Fisher’s test) by considering the statistic 


= ny(n2 = 1) (s 02 ) (5.1) 


n 
nz(n; — 1) "Sno O1 


that follows a Fisher-Snedecor .¥5(n, — 1, n2 — 1) distribution under Ho 1). If test- 
ing the empirical estimator of T, (defined by removing the o; in (5.1)) does not reject 
the equality of variances, then Student’s procedure (t—test) testing the hypothesis 
Ho 2) of equality of means is conducted. It considers the statistic 


OV Em2 (äm — Xm) — (ui — m) 
V1/ny + 1/m Jms + ns? 


T (5.2) 


n2 


following a Student .Y(n; + n2 — 2) distribution under Hoo). This test is conducted 
by comparing the observed statistic defined by removing the u; in (5.2). See [207] 
for more details. 


Mann-Kendall test. The Mann-Kendall test [432, 499] is used to detect a trend. For 
each element x; of a sequence of n terms, we calculate how many of the preceding 
elements are smaller than it, which we call C;. The test then operates on the value of 
t, which is equal to the cumulative value 


If there are at least 10 elements, then the value of t follows a normal distribution 
with mean 
i(i — 1) 


4 


E[r] = 


and variance 
i(i — 1)(2i +5) 


Vid = 72 


The asymptotically Gaussian test statistic is given in terms of the difference u(t) 
between ¢ and Eft], i.e. 
t — Eft] 
u(t) = ———.. 
Viz] 
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With the help of a standardized normal distribution table, this difference is compared 
with the probability of randomly obtaining a value greater than u(t). 


Pettitt’s test. This test [612] detects changepoints in time series. Let k be an integer 
between 1 and n — 1. We calculate: 


k n 
U=} D sign(x; — X;). 


i=1 j=k+1 


The test statistic is then 
Z =; max |U(k)|, 
1 


Sf, a n—-1 


whose cumulative distribution function is known. We can, therefore, calculate the 
observed test statistic z and associate it with the level of significance required. A 
practical explanation of Pettitt’s test is given by [641]. 


5.4.3.3 Test of Uniform Distribution (POT Approach) 


This test examines whether there is a uniform distribution of occurrences of the 
random variable X. It is used only for POT-type samples [518] and provides back-up 
to the stationarity hypothesis for points exceeding the threshold. It also helps to test 
the Poisson nature of occurrences of exceedances. 

With the help of this test, we can check whether the dates at which peaks occur over 
the observation period are uniformly distributed, i.e., that the cumulative number of 
peaks selected since the start of the observation period is linear with respect to time 
elapsed. The test, therefore, analyzes the occurrence dates of all [[sup-seuil]] peaks 
and compares the theoretical expectation or mean (resp. the standard deviation) of 
occurrence dates with the observed mean ((resp. standard deviation). A consideration 
of the distance between the two allows us to determine whether the distribution of 
peaks is uniform or not, for a given error risk (e.g., 5%). An illustration of this test 
on flood data is given in Fig. 5.8. 


5.4.3.4 Detecting Non-stationarity 


If, with the help of the previous tests, non-stationarity is found or suspected in study 
data, it is recommended to take it into account. Two approaches are possible; the 
exploration of each is advised in practice. 


1. Correcting for the non-stationarity in the original data, for instance by extracting a 
trend and transforming these data in a stationary sequence. This is done, for 
example, for the correction of sea level eustatic change (Example 5.12), where a 
linear trend (eustatic change) is estimated on the series of annual sea level averages 
and then used to correct daily sea level data observed in the past. Note, however, 
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Fig. 5.8 Test of uniform distribution applied to flood data. The points represent the observations 
and the line indicates the uniformity of the flood distribution 


that correcting for the “mean”, non-stationarity cannot entirely correct the non- 
stationarity of the largest values in the sample. It may also be necessary to consider 
whether there is an evolution of the variance over time. The stationary extremes of 
the corrected series are used for the study, adding to them the previously extracted 
trend. This approach remains corrective, since it concerns all data without taking 
a specific interest in the deviations from the laws of extremes. Thus, it makes it 
difficult to extrapolate. 

2. Explicitly taking into account non-stationarity in the extreme values, through the 
assessment of trends in the parameters of the extreme value distributions. This 
approach is detailed in Chap. 8. 


Chapter 6 A) 
Univariate Extreme Value Theory: ge 
Practice and Limitations 


Anne Dutfoy 


Abstract This chapter presents the classic theory of univariate extreme values and 
its application to various natural hazards described in the previous chapters. The 
limitations of this approach are discussed, which pave the way for the second part of 
this book. 


6.1 Introduction 


6.1.1 Extreme Values for Natural Hazards 


Extreme value studies conducted by engineers and researchers of large energy com- 
panies are generally aimed at quantifying the robustness of an electricity produc- 
tion structure against extreme conditions of climatic or meteorological origin. This 
already begins during the the design phase of the structure: the level of robustness 
required with regard to extreme natural hazards has a direct influence on the dimen- 
sioning of the structure. The natural hazards we consider are those which can damage 
the functioning of a structure, and also those which reduce its level of safety. For 
example, a structure located on a river needs to be protected against flood. Some 
structures also need to be robust against intense storms so as to protect power supply 
pylons. 

In extreme value studies, extreme values of a natural hazard are defined as those 
which exceed some high threshold value. The definition of an extreme event thus 
requires the choice of this threshold. But which threshold should we use? Very often, 
the threshold is not set directly and is derived from the return period of the extreme 
event. Under certain conditions explained in the following sections, a return period 
can be interpreted as the average duration between two successive episodes of an 
extreme event. Thus, a centennial storm occurs, on average, every 100 years and a 
millennial flood occurs, on average, every 1000 years. 
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The robustness of a structure can then be tested with respect to a particular return 
period of the extreme event. For instance, the height of a (flood protection) dike can 
be calculated from the centennial (or even millennial) river flow; the robustness of a 
structure to wind can be dimensioned with respect to the centennial wind speed, etc. 
To calculate such return levels, statisticians first look at the probabilistic behavior 
of the annual maxima of the quantity considered. This is called the block maxima 
(MAXB) approach, where each block generally corresponds to one year. In the flood 
example, they take into account the largest flow recorded every year over several 
decades, and in the storm example, they analyze the set of highest daily wind speeds 
to extract the maximum value for each year. 

Extreme value theory allows us to determine the asymptotic distributions of such 
annual maxima. These are known as generalized extreme value (GEV) distributions 
and come in three flavors, corresponding to three possible domains of attraction, i.e., 
three types of limit distribution possible for the distribution of maxima of a sample: 
Gumbel, Fréchet, and Weibull. The return level can be easily deduced from the GEV 
distribution: it is simply the GEV’s order (1 — 1/T) quantile if the return period is 
T years. 

Another approach is to consider all of the data above an extreme value threshold 
rather than reducing it to a single data point per period (block). This approach is 
called peaks over threshold (POT). Extreme value theory allows us, under certain 
conditions, to quantify the distribution of the magnitude the values of which exceed 
the threshold, which turns out to be a generalized Pareto distribution (GPD). Here the 
return level can again be easily deduced as a particular quantile of the distribution. 

Inversely, it is also quite interesting to define an extreme event directly in terms 
of an extreme threshold value and then quantify the annual frequency of the event. 
Though the notion of a return period is well defined, that of the annual frequency 
is less so, and potentially corresponds to several definitions, depending on whether 
we consider the average annual number of days during which the extreme event 
occurs, the average annual number of episodes during which the event occurs, or the 
probability of having at least one episode per year. 


6.1.2 Chapter Contents 


This chapter first recalls two fundamental results in univariate extreme value theory 
without diving too deeply into the theory. A more in-depth treatment can be found in 
[65, 173]. First we present the main result on the asymptotic convergence of annual 
maxima, as well as a convergence result for the distribution of cluster maxima, 
stated in the unifying colored point process framework (Definition 6.8). As indicated 
in Sect.3.1.1.2, the real-world extreme value study techniques we describe in what 
follows are based on these two theorems. We also recall practical difficulties in 
applying them, and show how in practice, faced with data that do not exactly satisfy 
the hypotheses of these fundamental theorems, practitioners try to move back to the 
stationary framework found in the theory. 


6 Univariate Extreme Value Theory: Practice and Limitations 85 


The notions of a return period and its associated level, as well as their interpreta- 
tions, are then examined in detail. We also explain the different ways of estimating 
these starting from GEV or GPD distributions, and examine various mathematical 
definitions and forms of the annual frequency of an event. 

The final part of the chapter outlines the different steps involved in a study of 
extreme values, illustrated by two real examples. These steps help us, first of all, to 
preprocess the data in order to bring it back to the stationary framework required by 
the convergence theorems (where possible). Then, we give examples of the inference 
of extreme value models and validation steps for estimated models, before arriving 
at the calculation of actual return levels, as well as the associated annual frequencies. 


6.2 Fundamental Results 


This section recalls two fundamental convergence theorems for univariate extreme 
values. The first allows us to know the asymptotic distribution of annual maxima 
(MAXB framework) and the second is that of the exceedances of cluster maxima 
(POT framework). We will state these theorems in the unifying framework of colored 
point processes, which allows us to obtain results for both of the classical approaches: 
block maxima (block = year) and threshold exceedance. 


6.2.1 Some Definitions 


For clarity, in what follows, let us first recall several definitions. 


Definition 6.1 (GEV distribution) The generalized extreme value distribution is 
defined in terms of its cumulative distribution function 


exp{—(1+ x) E} if1+&x >0,€ 40 


Fo py IE) = exp {— exp (—x)}  ifx e R,E =0, 


where & € IR is the shape parameter. More generally, we define 


x-y 
Fos XIE, 0, u) = Fggy B , 


oO 


where u € IR and ø € IR} are, respectively, the position and scale parameters. 


The GEV distribution corresponds to three different distributions written using the 
same formulation. When & > 0, it is a Fréchet distribution defined for x > u — = 
When £ < 0, it is a (reversed) Weibull distribution defined for x < u — F. Lastly, it 
is a Gumbel distribution when £ = 0, and is thus defined on JR. Figure 6.1 illustrates 
the behavior of the three GEV distributions; the evolution of the order 1 — p quantile 


86 A. Dutfoy 


Fig. 6.1 GEV distribution 
Fg py (IE): evolution of the 
quantile of order 1 — pasa 
function of 

— log [- log(1 — p)|, where 
£ = —0.2, 0 and 0.2 24 ve §<0 


qantile(1-p) 


is plotted as a function of — log |- log(1 — p)]. The straight line corresponds to the 
Gumbel distribution, the convex one to the Fréchet distribution, and the concave 
one to the Weibull distribution. 

The GEV distribution has an important property: max-stability.! The interpreta- 
tion of max-stability is the following: when we are interested in a sample of maxima 
of GEV variables, and also in the distribution of the maximum of such samples, 
the latter is still a GEV. This scale invariance can be written more formally in the 
unidimensional case as follows: 


Definition 6.2 (Max-stability for univariate distributions) Let X be a real-valued 
random variable with cumulative distribution function F. Then, F is said to be max- 
stable if for any n > 2, there exist sequences (an)n € IR and (b,), € IR$ such that, 
for all x € IR 


F” (an + bax) = F(x). 


Definition 6.3 (GPD distribution) The generalized Pareto distribution is defined in 
terms of its cumulative distribution function 


HAVE : x 
1-(1+8=) if1+é~ >0,£& £0 
Fg, XIE, 0) = G. o 
1—exp{-=| ifx >0,é = 0, 


where £ € JR is the shape parameter and ø € JR* the scale parameter. 


' More precisely, this property exactly and uniquely characterizes the family of GEV distributions 
(Theorem 3.2 in [173]). 


6 Univariate Extreme Value Theory: Practice and Limitations 87 


Fig. 6.2 GPD distribution 
Fg pp (CIE, o): evolution of gd 
the quantile of order 1 — p 

as a function of 

— log [- log(1 — p)|, where a 
£ = —0.2, 0, and 0.2 and 

o=1 24 


quantile (1—p) 
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4 


~hog-log(-p)] 


More generally, we define 


Fay XIE, ©, u) = FG pp (x = ulé, o), 
where u € IR is a threshold parameter. 


Like for the GEV distribution, the GPD one brings together the Fréchet, Weibull, 
and Gumbel distributions into a single formula. Figure 6.2 illustrates this behavior: 
the evolution of the order 1 — p quantile is plotted as a function of — log |- log(1 — p)| : 
the straight line corresponds to the Gumbel, the convex one to the Fréchet, and the 
concave one to the Weibull. 

The identification of independent clusters (or data aggregates) is a fundamental 
step in the analysis of extreme values. 


Definition 6.4 (Cluster) A cluster is a group of consecutive and correlated values 
from the same process, sharing some feature. For example, a cluster of exceedances 
can be seen as an upsurge of exceedances of a given threshold. 


In the case of river flow, a cluster might correspond to a river flooding: exceedances 
in the same cluster would all be due to the same flood. Similarly, in the case of a 
storm, a cluster might correspond to a strong wind gust during which the wind 
speed exceeds a given threshold over several consecutive (or almost consecutive) 
measurement times. 

Recall also that the stationarity of a random process is a fundamental feature 
when dealing with extreme values. 


Definition 6.5 (Stationary processes) A process is stationary if its probabilistic 
properties are the same over time; i.e., its probabilistic behavior—in other words, 
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its probability distribution—over a fixed time window is the same, independent of 
where the window is placed in time. We say we have second order stationarity when 
the mean and variance (when they exist) do not depend on time. See Definition 4.1 
for a formal characterization. 


It is useful to remember that a stationary process can have temporal correlation, 
i.e., when the value taken by the process at a given time depends on the values it 
takes at earlier times. For example, the daily air temperature process is temporally 
correlated: the temperature on a given day is close to that of the day before, and like- 
wise gives an idea of the probable temperature the following day without, however, 
determining it completely. 

The extremal index 0, defined below, is a quite useful scalar variable as it makes 
it possible to quantify the correlation of a stationary process in its extreme values, 
and thus answer the following question: when a process exceeds a high threshold at 
a given time, will it be followed by others exceedances or remain an isolated event? 


Definition 6.6 (Extremal index 0) Let X;,..., X, be a stationary process, and con- 
sider an iid sample X:,..., Xn following the same distribution as the X;. We note 
XF = max(X1,...,X,) and Xš = max(X1,...,X,), and F and F the cumulative 


distribution functions of X* and x *, respectively. Following [463], there exists a real 


number 0 such that O < 0 < 1, called the extremal index, for which 
F= F°. 


This index quantifies the temporal correlation of a process in its extreme val- 
ues. Beirlant [65, p 380] has shown that 9 represents the limit probability that the 
exceedance of a threshold by a process be followed by another value above the 
threshold. Thus, when 6 = 1, values exceeding the threshold tend to occur in iso- 
lation as the threshold is raised higher and higher. When @ < 1, values exceeding 
the threshold tend to occur in groups, the more so as 0 gets smaller and smaller. By 
considering the distribution of times between independent clusters, [65] has shown 
that 6 can also be interpreted as the mean cluster size. See also Definition 5.1 in a 
previous chapter for further interpretations and [133] for more technical details. 


6.2.2 Fundamental Theorems 


The principal theorems justifying the use of GEV and GPD distributions are due to 
Leadbetter. They apply to stationary processes whose temporal correlation in their 
extreme values is not too strong. We denote D (un) [65, p 373] the technical condition 
which expresses this weak temporal correlation, where u, is a high threshold value. 


Definition 6.7 (Condition D(u,)) Let I; (un) correspond to the event 1; (un) = 


nN Ai < un}. For any events A; from Jı e(un) and Az from [p45 n(un) With 1 < 
j<i<k 


£ <n — s, we have 
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IP (A1 N Az) — P (A1) P (A2)| < a(n, 5), 


where a(n, s) — 0 as n — œ for a sequence of positive integers s, such that s, = 
o(n). 


In other words, the events {max Xk < un} and {max Xk <u,} become inde- 
el; Eh 


pendent when n increases if the sets of indices J; C {1,...,n} are separated by a 
relatively small distance s = o(n) with respect to n. 

We now detail the theorem showing the existence of a limit distribution for M,, 
the maximum of the stationary process X over blocks of size n 


M, = max(X,,..., Xn), (6.1) 


where X; is the value of the process at time i. 


Theorem 6.1 (Leadbetter 1974 [461]) Let X be a stationary process and M, defined 
by (6.1). Suppose that there exist two real-valued sequences (a,), and (b,), such 
that a, > 0, and a nondegenerate cumulative distribution function F such that 


. Mn = bn 5 . ‘ . 
lim P | ——— < x | = F(x) (convergence in distribution). 
Qn 


n— 00 = 


If the condition D(u,) is satisfied for the threshold u, = anx + bn at all x for which 
F(x) > 0, then F is a GEV cumulative distribution function Fa, (.\£). 


By inserting the sequences a, and b, as position and scale parameters of G, it 
comes 


P(M, < x) T Fay lE, ©, u). 


Note that when the process has temporal correlation in its extreme values—as 
quantified by the extremal index 6—the parameters (£, o, 11) of the limit distribution 
of the maxima of F are different to those—noted (&, 6, ñ)—that we would have 
obtained for a process with the same marginal distribution but without temporal 
correlation. This difference is expressed in the following formulas: 


E =E 
o if £ #0, then | u = ñ — č Ë 
o = õ05, 
A + o logo 


e if £ = 0, then |“ — 

o =ð. 

Other results can be stated in the framework of colored point processes, which 

record the moments at which events happen, as well as the value of a “color” at such 
moments. 
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Definition 6.8 (Colored point process) A point process is a stochastic process (cf. 
Sect. 4.1.5), where each draw is a set of points. It is said to be colored (or marked) 
if each point can be associated with some characteristic color (or mark). 


For example, a point may represent a spatial location and the color the intensity of 
an event occurring at that location. A Poisson point process is the simplest example 
of a colored point process. It states that the number of events occurring in a spatial 
or temporal region follows a Poisson distribution and that the intensity of each event 
follows a particular distribution. This process will be encountered later when we will 
want to quantify events corresponding to exceedances of extreme-valued thresholds. 

In order to understand how a behavior is distributed, it is valuable to look at the 
times at which its values exceed a high threshold and associate these with a color 
corresponding to the excess above the threshold. We can also extract the sequence 
of times corresponding to cluster maxima and associate these with cluster size. 

The elegance of Leadbetter’s work comes from how it unifies the MAXB and POT 
approaches. Indeed, the following theorem characterizes the number of clusters over 
a period of length n as well as the distribution of the exceedances of cluster maxima 
OVET Un. 


Theorem 6.2 (Leadbetter 1991 [462]) Let X be a stationary process with extremal 
index @ and marginal cumulative distribution function F. Let u, be a threshold 
for which the condition D(u,) (limiting temporal correlation in extreme values) is 
satisfied, such that 


n(1— F(u) = t < +00. 


Then 


(i) the number N,(u,) of cluster maxima exceeding u,, recorded over a period of 
length n, converges in distribution when n — œ to a Poisson distribution with 
mean ^c(un) = OT; 

(ii) when n — œ, the exceedances of cluster maxima over u, converge in distribu- 
tion to a generalized Pareto distribution Fy,, (IE, 0). 


It is possible to link the intensity A.(u) of the threshold (u) exceedance process 
to the limit distribution of the maxima F4,, (.|£, o, u) by noting the equivalence of 
the events 

{Mn < u} = {N, (u) = 0}. 
Then 
Fg,,(ul§,o, 4) = P (M, < u) = P(N, (u) = 0) = exp(—A-(u)), (6.2) 


which shows that A. (u) = — log Fg, (ulË, o, 1). Hence 
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In other words, the exceedances of an extreme-valued threshold follow a Poisson 
process whose colors are Pareto distributed. 

Theorem 6.2 proves to be very useful in the studies of extreme value, since the 
quantification of the distribution of cluster maxima exceeding a given threshold 
enables to model the tail distribution of the marginal distribution F of the process 
X. This is based on the following max-stability result. 


Proposition 6.1 (Beirlant 2005 [65, p 387]) The asymptotic distribution of the 
exceedances of cluster maxima over u, is equal to the distribution of the exceedances 
over un of the process itself. 


Thus, the asymptotic distribution Fy,, (.|&, o) of the exceedances of cluster max- 
ima over up, is also that of (X — u,|X > un), which, setting a = F (u) = 1 — F (u), 
allows to write 


F(x) ~aF(x —F'(a)) forx >u. (6.3) 


Note that the interest in determining the distribution Fy,,, (.|&, o) on cluster max- 
ima rather than directly on process exceedances comes from the fact that these 
cluster maxima are independent by construction. For estimating the parameters of 
Fg, (|&, ©), it is possible to use a maximum likelihood approach that requires this 
independence. Direct inference using the process’s exceedances of a threshold cer- 
tainly has the advantage of increasing the number of data points, but it also requires 
modifications to estimators and their confidence intervals to take into account the 
data temporal dependence. This approach was chosen in [127, 227]. 

Connections exist between the distribution Fy,,(.|§,o) of cluster maxima 
exceedances over a threshold u (and thus of the process exceedances above u too) 
and the limit distribution Fy,,,(.|&) of the process maxima. In particular, the shape 
parameters € are equal. However, the position and scale parameters of Fs o and G 
are different. This is a consequence of the following proposition from Pickand’s 
foundational article [615]. 


Proposition 6.2 (Pickands 1975 [615]) There is equivalence between the conver- 
gence in distribution of the maximum towards a GEV distribution, noted Fg,,,(.\&), 
and that of an exceedance distribution towards a GPD, noted Fg,,(.\§, 0). Further- 
more, these two distributions have the same shape parameter & 


n— 00 


. Mn = Dn 
lim P (+ < x) = Fg py (1) 
an 


if and only if 
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lim sup |F,(y) — Fy,, (yl, o(u))| = 0, 


OXF yel[0,xr—ul 


where xr € IR is the upper bound of the support of F, and F, the cumulative distri- 
bution function of exceedances X over u, i.e., of (X — u|X > u). 


An important property of GPD distributions, which is the counterpart of max- 

stability for GEV distributions, is shape-stability when the threshold u is modified. 
This property is sometimes called heredity and is the subject of the following propo- 
sition. 
Proposition 6.3 (Stability of modified GPD parameters (heredity)) Let X be 
a stationary process. If the exceedances of X over us follow a GPD distribu- 
tion Fy,,(.|§, ©), then the exceedances of X over u > us follow the distribution 
Fg, Clé, Ou), where O, = o + &(u — us). The modified scale parameter o*(u) = 
On — Eu is thus constant with respect to u. 


Incidentally, this result allows us to easily calculate an important function of 
interest: the expectation of the exceedances of X over the modified threshold. The 
plot of this function is usually called the mean excess plot. 


Proposition 6.4 (Linearity of the conditional expectation of exceedances) Let X 
be a stationary process. If the exceedances over us of X have the GPD distri- 
bution F; œ, then the expectation of the exceedances of u > us by X is given by 
Ou, + Eu 
E[X —u|X > u] = ——— 
Le 
In conclusion, the fundamental results recalled here guide all extreme value stud- 
ies. Indeed, the study of cluster maxima will allow us to: 


1. Evaluate the mean number of independent clusters exceeding the threshold u; this 
value will be essential when evaluating annual frequencies defined as the mean 
number of independent clusters exceeding u; 

2. Model the tail of the marginal distribution of the process and thus estimate the 
probability of exceeding any threshold greater than u,; 

3. Estimate specific return period levels. 


In the analysis of the extreme values of a process modeling environmental vari- 
ables, the typical block considered is a year, often limited to a time period where the 
process is stationary. 


6.2.3 Practical Difficulties 


The theorems stated earlier, on which rest all of the results of extreme value studies, 
are asymptotic convergence results. They therefore suppose that the number of data 
points n tends to infinity, which is clearly not the case in practice. The finite nature 
of real data sets can lead to modeling errors and parametric estimation errors, which 
we detail below. 
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6.2.3.1 Choosing Data Representation 


When practitioners want to model the distribution of block maxima using a GEV 
distribution, they are confronted with the following issues: 


e Choosing large blocks (i.e., considering maxima over long time periods) minimizes 
the modeling error; the distribution of maxima indeed tends to a GEV distribution. 
On the other hand, this considerably reduces the number of data points available 
to estimate the model parameters, as only one point is retained per block. Thus, 
estimated GEV parameters threaten to lack of precision. 

e Choosing small blocks increases the modeling error, as the block maxima distri- 
bution does not really converge to a GEV distribution. Nevertheless, more data is 
available for estimating the GEV parameters, which reduces estimation error. 


When modeling the distribution of exceedances of a threshold by a GPD, practi- 
tioners face similar difficulties: 


e Choosing a high threshold leads to a good fit of the exceedances distribution to a 
GPD, but drastically reduces the number of data available to estimate the GPD’s 
parameters. Essentially, we have only the points above the threshold to work with. 

e Choosing a lower threshold increases the modeling error for the exceedances dis- 
tribution, but decreases parameter estimation errors. 


In conclusion, we are faced with the following dilemma: estimate poorly a good 
model, or estimate well a bad model. Which should we choose? The notion of bias- 
variance tradeoff, introduced in Sect. 4.2.21 and applied to estimators of quantities of 
interest (return levels), could be a way to orient our choice, on the condition that we 
have a way to generate independent data from the chosen model (e.g., nonparametric 
bootstrap [244]) and sufficient data for validating the quality of return levels. 


6.2.3.2 Minimal Quantity of Data Required 


To the quantitative elements in Chap. 5, let us add the following conclusions drawn 
from a large number of case studies. 

Depending on the environmental variable in question, the number of years of 
observations can vary from a dozen up to several decades, which makes very 
difficult—if not impossible—the statistical estimation of model parameters. For 
example, hourly river flow measurements often exist for periods of 80 years or more, 
while extreme tide ones generally exist over shorter periods, typically ten to twenty 
years. 

Note that even in cases where measurements exist over long time periods, we have 
to be careful about their homogeneity, which may be impacted by a technological 
change in the measuring equipment or a significant change in the environment. In such 
cases, it is generally necessary to truncate the data series, unless information is avail- 
able to statistically deal with such changes in historical data (cf. Sect.5.2.2). In this 
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context, the French national meteorological service—Météo-France—recommends 
not going further back than 1981 in historical wind speed studies due to changes in 
equipment, i.e., homogeneity cannot be guaranteed before this date. 


Example 6.1 (Hydroelectric impact on river data) 


The construction of hydroelectric power stations on the Garonne river signif- 
icantly modified the river system, which can be seen in a sudden change in 
extreme flow values in the 1950s. This change point has until now restricted 
the analysis of historical data to records taken after 1950, even though flow 
measurements exist back to 1915. 


In addition, temporal correlation found in physical phenomena amounts to further 
reducing the number of data that can actually be used for the statistical estimation 
of models. Indeed, it is easy to see that the greater the temporal correlation, the less 
information a dataset of size n contains. Thus, if we quantify correlation in terms of 
the index 0, a correlated dataset of size n is equivalent to an independent one of size 
n0 (cf. Definition 5.1). This is indeed what the asymptotic results show when the size 
n from an independent stationary process is replaced by n0 for a stationary process 
with temporal correlation. Such correlation thus amplifies the difficulties associated 
with having small quantities of data. 

Hence, even if (as we have already emphasized) the POT approach stands on 
more data points available to the size used for adopting a MAXB approach, the 
problem of being short of data is not entirely resolved. In effect, in processes with 
temporal correlation, the data available to POT must be reduced to the set of cluster 
maxima which satisfy the independence hypothesis. This declustering* considerably 
decreases the amount of data that can be used to estimate the GPD distribution. 

Generally speaking, the modeling of exceedances above a threshold with a GPD 
distribution is very sensitive to the choice of this threshold. Several techniques can 
help choose this threshold well—see Sect. 6.4.5. 

Furthermore, estimation of the shape parameter & is quite tricky. Physical consid- 
erations on the boundedness of the magnitudes in question may help. For instance, 
river flow is bounded by the quantity of water found in its watershed. It is thus rec- 
ommended to model the tails of flow distributions with € < 0. An error in the sign 
of & (or its nullity) would lead to a modeling error which could have significant 
consequences on the estimation of magnitudes calculated as levels of large return 
periods. Section 6.6 gives an idea of research efforts conducted about the estimation 
of this parameter. In practice, both the POT and MAXB approaches are used and the 
choice of one over the other mainly depends on the number of data available and 
temporal correlation considerations. 
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6.2.4 Why the Choice of Generalized Distributions? 


Algebraically speaking, it might seem easier to simply use one of the the three 
possible distributions (Gumbel, Weibull, and Fréchet) rather than the GEV one. 
However, practice shows that in most cases, GEV estimators are more robust, in 
the sense that the estimated distribution provides more physically plausible results 
in terms of return levels and periods. The example of torrential rain that fell on 
Vargas (Venezuela), in 1999, illustrates this. In December 1999, two days of high 
rainfall (120mm followed by 4104mm the next day) provoked mudflows and the 
death of more than 30,000 people [203]. Coles and Pericchi [175] have studied, with 
GEV and Gumbel distributions, the distribution of annual rainfall maxima in this 
geographic region. Their GEV proposed a return period of 4280 years, while the 
Gumbel suggested 17.6 million years. 


Remark 6.1 For a long time now, the Gumbel distribution, easier to estimate with 
small sample sizes (because it has one less parameter) has remained the most-used 
distribution for modeling the extremes of meteorological variables [204], notably 
rainfall [444]. Nevertheless, as shown by Jenkinson in 1955 [413], then Koutsoyiannis 
[445], this distribution comes with risks for structural engineers as it frequently 
underestimates the return levels associated with long periods. This behavior can be 
explained, in particular, by the extremely slow convergence of distributions of events 
occurring in Gumbel’s domain of attraction, which necessitates the use of large-scale 
blocks (periods) [445]. A further argument supporting the use of GEV distributions 
in place of Gumbel and Fréchet ones is the non-negligible risk of not being able to 
differentiate between the two when sample sizes are small [53]. Other practical and 
more general reasons for working with GEVs are provided in [184]. 


6.3 What Does Extreme Mean for a Natural Hazard? 


As we stated in the introduction, in the conception phase, structures are dimensioned 
with respect to the extreme values of some hazardous event. Similarly, the robustness 
of an existing structure is very often defined in terms of its ability to resist a certain 
level of extremality, defined a priori, of a given hazard. This level is described in 
terms of a return level or annual occurrence frequency. 

Before continuing, we recall the definitions of return level and frequency of 
extreme values of a hazard, along with interpretations to help better understand 
these concepts. 
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6.3.1 Return Levels 


In this section, consider a process X that is stationary over a period A of each year, 
and suppose that the extreme values of X which interest us occur during A. In prac- 
tice, A corresponds to the season in which extreme values of X occur, e.g., the 
summer for heatwaves (high air temperature), fall and winter for high river levels 
(large flows), etc. 


6.3.1.1 The GEV Case 


The annual maximum of a process occurs during the period A. We note its value M,, 
where n is the number of measures taken during A. Return period levels are then 
defined as particular quantiles of the asymptotic distribution of the maxima M,,. 


Definition 6.9 (Return period levels) If F, the marginal distribution of the stationary 
process X, is in the domain of attraction of the GEV distribution of the cumulative 
distribution function Fv,,,(.|&, o, u), then the level xr of the “T years” return period 
of X is defined by 


Fous ré, 0, u) =1—1/T. (6.4) 
For large n we have: 
1 
P(M, > xr) = T 


The return level x7 is therefore the order-(1 — 1/T) quantile of the asymptotic 
distribution of the annual maxima M, 


u at [—log(1 1/T)| "| ite #0 
u — o log(log(1 — 1/T)) ifé =0. 


(6.5) 


Xr = 


Therefore, the use of asymptotically Gaussian estimators (£,0,andu) with 
asymptotic variance V (such as those produced by the methods described in 
Sect. 6.4.2, where V is the matrix V; or V2), allows us—with the help of the delta 
method (cf. Theorem 4.4)—to construct the also asymptotically Gaussian estimators 
XT 


Jim (îr — xr) > N 0, KVK") (6.6) 


when m — +00, where K is defined by 
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x = [Z OXT | 


da” dE” du 
where 
1 
Kia = Eloi 1/7) —1], 
Ki2 = = [Kia + (—log(1 — 1/T)) % log(—log(1 — 1/T))], 
Ki3 = 1. 


Now let us give a simpler interpretation of the level corresponding to a return 
period of T years. It is based on the convergence of the cluster maxima which 
exceed the level u to a Poisson process with intensity A,(u). Indeed, convergence 
of cluster arrivals to a Poisson process implies that the time between the arrival 
of two cluster maxima is a random variable Y(u), which follows an exponential 
distribution with parameter A,(u). Thus, the mean duration between the arrival of 
two independent clusters (or exceedances) is E [Y (u)] = 1/A,.(u). The asymptotic 
approximation (6.2) 


A-(u) © nO F(u) © — log Fg, (IE, 6, u), 


applied at the level x7, shows that 


1 
E[Y = — — fT forT 1 
[Y (xr)] lost 1/7) or T > 1, 


which leads to the following—very common—interpretation. Recall that Proposi- 
tion 6.5 on return levels is a characterization and not a definition (which is provided 
in (6.4)). 


Proposition 6.5 (Interpretation of a return level of T years) A return level of T 
years for an extreme-valued hazard can be interpreted as follows: 


e On average, there are T years between extreme episodes of a process with a return 
period of T years. 

e An episode exceeding the level with a return period of T years occurs on average 
once every T years. 


6.3.1.2 The GPD Case 


However, practitioners do not always use (6.4) for estimating a return level xy since 
it depends on an estimation of the asymptotic distribution of the annual maxima. In 
order to make best use of their data, they prefer to study the cluster maxima process, 
which provides an estimate of the tail of the marginal distribution F’. 
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This approach is straightforward since it is easy to link a level x7 with a particular 
quantile of F using the relationship F,, (.|£, o, 4) = F”? for large n, which comes 
from Definition 6.6, where 0 is the process’s extremal index. Applied to xr, we obtain 


Far) = 0-1/7, 


which turns the return level xy into the quantile of order œ = (1 — 1/T)!/"° of F. 
Then, if the excess of maxima above u converges towards a Pareto distribution, 


from (6.3), it comes 
m (<=) is ite 40 
u 
oh i.e 
—<) 


ifé =0, 


u 


where ¢, = P(X; > u). 
When T is large, we can make the approximation | — œ ~ 1/n@T, which leads 
to the well-known equations [173] 


o 
u+ = [0T t) — 1} if £0, 
ut+olog(n6Tt,)  if& =0. 


(6.7) 


xr = 


The probability &, can be estimated by the fraction n,/n, where n, is the number 
of exceedances of u by the process (though the variance of this estimator increases 
with temporal correlation in the process). Similarly, the parameter 0, whose inverse 
corresponds to an estimate of the mean cluster length, can be estimated by the fraction 
N¢/Ny, Where ne is the number of clusters exceeding u. 

As in the GEV case, the use of asymptotically Gaussian estimators, with asymp- 
totic variance V (such as those resulting from the methods in Sect. 6.4.2, where V 
is the matrix V3 or V4), for the parameters (o, £) enables us, using the delta method 
(cf. Theorem 4.4), to construct estimators x; which are themselves Gaussian 


VN, (Xr — xr) —> N(0, LVLT) in distribution when m —> +00, (6.8) 


where the matrix L is defined by 


LT = OxT OXT 
7 | 3o’ ð|’ 


with 
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Oo oO 
Li2 = (p= —1) zP lp, 


= 1/n0 
where we have set p = ( =) and a = (: — 7) for clarity. 


6.3.2 Annual Frequency 


Let us now return to the stationarity hypotheses on the process X over the period A 
of size n, and consider the extreme-valued hazard & = {X > x}, where x is a high 
threshold value. By the stationary of X, the probability p = P (£) is independent of 
the time f for any f in A. 


Remark 6.2 We often see the use of the term daily probability for p, which is 
surprising in the sense that a probability is dimensionless. Nevertheless, continuous 
processes X are often projected onto a daily grid. For example, flow measurements 
are mean daily ones, wind speeds are the daily maxima of measurements taken 
every 10 min, etc. Hence, the time f corresponds to a day. The term daily probability 
therefore simply reminds us that this probability corresponds to events defined on 
one day long periods. 


The stationarity hypothesis on X does not preclude the process from having tem- 
poral correlation. However, when it does, it turns out that the annual frequency of 
the event & can be defined in different ways. In this book, we propose three possible 
definitions, which correspond to whether we consider all of the occurrences of & 
over the year, or only the independent ones in which & occurs. 


6.3.2.1 Several Possible Definitions 


Suppose that observations are daily and we want to estimate the annual frequency of 
the event &. The explanations below generalize well to other units of time. 


Definition 6.10 (Annual frequency—1) The annual frequency of the event & is the 
mean number of days per year where & occurs. 


This point of view is based on a mean magnitude. Thus, from one year to the next, 
the effective number of days where & occurs can vary greatly around the mean in 
question. In particular, if the temporal correlation in X is strong, the number of days 
in a year where € occurs can be much larger or much smaller than the estimated 
mean. 

In general, the stronger the correlation, the less the mean value is representative 
of the number of days that & effectively occurs. 
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Definition 6.11 (Annual frequency—2) The annual frequency of the event & is the 
mean number of episodes per year where & occurs. 


This definition introduces the notion of episode, similar to that of cluster. An 
episode groups together the sets of times at which & occurs which are correlated. 
When X exceeds the threshold x at time f, the correlation between time instants 
means that the process will likely exceed x during a certain number of subsequent 
instants. Each such wave of correlated exceedances is counted as “1”. 

Here, the annual frequency thus corresponds to the mean number of waves of 
exceedances per year, irrespective of their lengths. 


Definition 6.12 (Annual frequency—3) The annual frequency of the event & is the 
probability that & occurs at least once per year. 


In other words, it is the probability that there is at least one episode of exceedances 
in a given year, or equivalently, the probability that the annual maximum exceeds 
the threshold x. This definition again uses the notion of episode. On the other hand, 
the annual frequency estimated here is simply the probability of having an episode 
of exceedances at least once in a given year, no matter the mean annual number 
of episodes. This definition is especially relevant if the occurrence of the episode in 
question is critical for the industrial installation that is affected by it. By analogy with 
faulty materials, the installation will be beyond repair to its initial state after such 
an episode occurs. Thus, all that interests us is the occurrence or not of the episode. 


Remark 6.3 Note that Definitions6.10 and 6.11 are mean numbers of days or 
episodes. These frequencies are thus positive numbers that may be greater than 1. 
In the case of extreme values, the event & has a very small probability of occurring, 
which will lead to small annual frequencies, likely less than 1 for Definitions 6.10 and 
6.11—though not necessarily. On the other hand, Definition 6.12 defines the annual 
frequency as a probability. By definition, it is, therefore, a positive number no greater 
than 1. 


6.3.2.2 Calculating the Annual Frequency 


Depending on the definition chosen, the calculation is different. Expressions for 
these annual frequencies can be easily obtained with the help of results on Poisson 
processes for exceeding thresholds. They are given in the following proposition. 


Proposition 6.6 (Annual frequencies) Denoting Fİ (£) the annual frequency of the 
event & for Definitions 6.10, 6.11, and 6.12, respectively, we have that: 


a CE) = mp, 
SE) = npo, (6.9) 
6-15 (E) = 1 — exp (—np0). 
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Remark 6.4 The expressions F613 (£) and F&!4(&) are the same when 0 = 1, i.e., 
when the process has no temporal correlation in its extreme values. Similarly, for 
extremely rare events, np@ < 1, and thus F°5(&) ~ F&(8), 


If the tail of the distribution has been modeled by studying the distribution of 
cluster maxima, we then have access to approximation (6.3), which lets us write, for 
the level x of the event & being considered (supposed higher than the threshold u of 
the GPD Fy, (.|&, 0, u)) 


p = F(x) xa (1 — Fy,, [x — F-'(@)lé,0,u]). (6.10) 


Remark 6.5 (Links with the reliability of materials) The reliability of materials is 
a domain in which there is often no temporal correlation (0 = 1) and/or extremely 
low failure rates. In such cases, the three definitions lead to the same result. In fact, 
we often use one of these definitions to define the annual failure rate of materials. 


6.4 Model Inference 


Numerous methods can be found in the literature for estimating GEV model param- 
eters under the MAXB approach, and GPD parameters under the POT approach. 
Here we present the maximum likelihood, weighted moments, and profile likelihood 
methods. These are easy to implement and lead to estimators with known asymptotic 
distributions, which means we can construct confidence intervals for each estimated 
parameter (see Sect. 4.2.2.3 for more details). 

Estimation of the shape parameter € is extremely important, particularly in terms 
of getting its sign (or nullity) correct. Note that when its confidence interval contains 
0, we generally retain the model £ = 0. It is essential to support this decision with 
physical arguments from domain experts. In this chapter, in addition to the previous 
estimators, we will also describe the most famous one, Hill’s estimator [389]. 


6.4.1 Parameter Estimation for GEV Distributions 


Let us for clarity denote Y the maximum of a statistical sample X1,..., Xn, a set 
of independent copies of the same random variable X. We then note Y,,..., Ym 
the statistical sample of such maxima. The aim of this section is to estimate the 
parameters (o, £, 1) of the GEV distribution of the maxima. 


6.4.1.1 Maximum Likelihood 


The log-likelihood function of a sample of maxima can be written—with the help of 
the density of the GEV distribution—when & 4 0 
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m m -1l 
1 Yi— u Yi-py\ £ 
log L(o, £, u) = —m log o € } 1) 2e (1 HE - ) > (1 HE 5 ) , 


Y; — 


Ifl+E ie > 0 for each i = 1,...,m. When & = 0, the log-likelihood is given 


by 


$ Y; L (Yi 
log L(o, 0, u) = -mlogo — Ÿ exp ( £) 5 ( ) (6.12) 
oO oO 
i=l 


i=1 


For € Æ 0, the support of the GEV distribution depends on the estimated parameters; 
the distribution is defined for all real-valued x for which 


x— pu 
(oy 


eee > 0. 


Thus, the usual conditions of regularity of the model underlying the asymptotic 
convergence properties of maximum likelihood estimators are no longer satisfied. 
However, in the case where £ > —0.5, we recover the convergence, asymptotic effi- 
ciency, and asymptotic normality properties of the estimators 


vin (6, E, à) (0,8 m) > VO, v) (6.13) 


when m — +00 and £ > —0.5. V; is the inverse of the Fisher information matrix 
defined by 


eee?) (6.14) 


10) =—E 
” ( 90907 


where 0 = (a, £, u)! . Thus, the components of 1 (8) are: 


1) = = TRE) +p) 


1-r(2+£) 2) 
f TẸ 


o2€2 


1 
h20) = = (0 Yx—4+ 


1 
13) = ae (p -P2+6)) 


1 | x? i) 2 
ha) = = f +(1 y+) aag] 


1 
hs®) =- (« = 2) 


b30) = oo 
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where y. is Euler’s constant, p = (1 + E2)T (1 + 28), qg=T24+8é)Wwdt+é)+ 
14+é 


=), and w(x) = d logT (x)/dx. 
6.4.1.2 Weighted Moments 


This method, introduced in [357], considers the following weighted moments, defined 
for a random variable Y with cumulative distribution function F: 


Mrs = EY? (FH) A — F(Y), 


for real values p,r, and s. The special case of GEV distribution parameters uses 
moments of type Mı „o with r = 0, 1, 2, written as 


Weise Ti Dra 
wo= ts T[i- + DTC #1}. 


The underlying principle of the method is to express each parameter (o, £, u) as a 
function of these moments by inverting equations (6.15)-(6.17) 


Mioo=u TE (TC) (6.15) 
oO 
2Mi.1,0~ Mioo = ETS — 1) (6.16) 


3M:20 — M 35 —1 
1,2,0 1.00 _ | (6.17) 
2Mi1,0 — Mioo 25-1 


Then, each weighted moment M; ;.0 is estimated by the unbiased Landwehr estimator 
m 


~ I r Ut). 
Miro = m > (re. qn — 2) Yims 


j=l 


where Y; „n is the j-th value of the sample Yj, ..., Ym ordered from smallest to largest. 
Equation (6.17), when solved numerically, gives an estimate Ê of €. Lastly, (6.15) 
and (6.16) allow us to estimate the two other parameters as follows: 


E EUR = Moo) 


ra—é) (2 +-1) 


ô = 


= thoo+ 2 (1-r0-8); 
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The advantage of this method is that it can make use of well-known properties of 


moment estimators to construct estimators of (o, €, 1) with the same convergence 


x A à T 
properties. In effect, let us now show that M = (M,,,0, Miro, Miro) is asymptot- 
ically efficient and Gaussian provided that £ < 0.5, i.e., 


Jim (M = M) * W(,V) (6.18) 


when m — +00 and é < 0.5, where the entries of V are written 
2 
o £ 5 
Urr = KG +1) | (TA —2E)K(r/(r + 1)) - T° — £)) 


2 
[Gr + DF — 2(r + 298] ra- &)} 


1 fo \? 2 £ 
(2) {Gr +2) ra- 28€)K(r/(r +2)) + (r +1) 


Uri = > 
E 


2 
Ur rts = ; (2) {C +s+ DTA — 26)K(r/¢ +5 +1) 


—(r +s)ŸT(1—2E€)K((r +1)/(r +5)) 
+2 +D [ts ers +D] EPa- E}, s22, 


with K(x) = HG (—E, —2&; 1 — E£; —x), where #@ is the hypergeometric func- 
tion. 

Denoting 0 = (0, £, u)T and Ô its estimator, Eqs.(6.15)-(6.17) can be written 
0 =k(M). With G the 3 x 3 matrix with entries g; ; = 0k/0M,;,0, we obtain the 


convergence properties of 6 from those of M, i.e. 
vm (6-6) = N0, Va) (6.19) 


when m — +00 and £ < 0.5, where V = GVGT. 


6.4.1.3 Confidence Intervals 


Using the Gaussian convergence property, we can construct asymptotic confidence 
intervals of level 100(1 — a)% for each parameter of the GEV distribution, centered 
on the estimates 
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ô + 0(1—a/2)—" 
m 


Ê a(l- o/22 
m 


pon e/2)—2, 
m 


where Ô; ; is the i-th diagonal entry of the matrix V; or Vi—depending on the esti- 
mation method used—when (ø, &, u) is replaced by its estimate. Recall that ® is the 
cumulative distribution function of the standard Gaussian distribution. 


6.4.1.4 Profile Likelihood Method 


We can usually obtain faster convergence of estimators using the profile likelihood 
method, introduced in [58]. The profile likelihood of & is written as 


Ly CE) = max L(o, &, p). 
oO, LÉ 
The ratio defined by 


A= Lpo) 
Ly) 


allows testing of the hypothesis Ho : {£ = &} against the alternative Hı : {E Æ £o}. 
Thus, for m — oo and under Hp 


—2log A 2. x, 


where x? is the x? distribution with one degree of freedom. Hp is rejected at level œ 
if —2 log A > xa — a). The level 100(1 — w)% confidence interval for € is thus 
given by 


g Ee) Z 


P 


IC; = [s :-21 a-o], 


which can also be written as 


ae 


IC; = |: : log L (E) > log L, (Ê) = 2 


When the 95% confidence interval contains 0, if œ = 5% it is not possible to reject 
Ho : € = 0. In this case, we select the Gumbel distribution. 
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6.4.2 Parameter Estimation for GPD Distributions 


Given a sample X1,..., Xn, we first extract the sample corresponding to exceedances 
of the threshold u, which we call Y1,..., Yw,, where Y; = X; — u if X; > u, with 
N, the number of times u is attained. 


6.4.2.1 Maximum Likelihood 


The log-likelihood of the sample Y;,..., Yy, can be written—with the help of the 
density of the GPD distribution—by reparameterizing the problem as a function of 
(t,o), where t = &/o 


Nu 
log L(t, E) = —N, logé + N, logt — G + 1) X log (1+ tY;) (6.20) 


i=l 


Y; ir : 
if 1+&— > 0 for each į = 1,..., N,. Thus, the maximum likelihood estimator 
o 


(ĉ, Ê) is given by 


where 


The estimator & can be deduced from (?, Ê) 


>| YM 


ô= 


The estimator (ĉ, é ) is asymptotically Gaussian if £ > —1/2, which if true implies 
that (ô, E) is too 


VN. (6,8) — (0.8) > 400, Vs) 


when N, — +oo and é > —0.5, where 


25 ee 
nsara n 
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6.4.2.2 Weighted Moments 


In the GPD case, the weighted moments method involves moments of type Mios 
for s = 0, 1, given by 
o 


M;i o,s = a for <1, 


and replacing M1,0,s by its empirical estimator 


Nu 


x 1 (Nu—j—£+1) 
Miro = — m 1—1 Yn, 
ON > ( o> eae ye 
where Y; y, is the j-th value of the sample Y|,..., Yẹ, put in order of increasing 


values. Inverting the system of equations defined by Mj 9,9 and M1,0,1 leads to the 
following estimators: 


A M100 
&é=2-— — 
Mi,0,0 — 2Mi0,1 
2M:0,0M1,0,1 
Mi,0,0 — 2M1 0,1 


lon 
| 


The estimator (6, Ê ) is asymptotically Gaussian if £ < 1/2: 


VN, (6.8) — (0,8) > 10, Va) (6.21) 


when N, — +œ andé < 0.5, where 


“=c| o2(7 — 18E + 1162 — 283 ee 


HU — Ẹ)(2 — 6E +7? — 23) ej E) (1 — Ẹ + 2°) 


with C = =e 
(I= 2) G = 28) 


6.4.2.3 Confidence Intervals 


Due to their asymptotic convergence to Gaussian distributions, we can construct 
asymptotic confidence intervals of level 100(1 — w)% for each of the GPD’s param- 
eters, centered at the estimates 
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A 1.1 

o+ P(1 —a/2)— 
m 

Ê+ @(1 — 0/2) 22, 
m 


where Ô; ; is the i-th diagonal element of the matrix V3 or V4, depending on which 
estimation method is used when replacing (o, £) by its estimate. 


6.4.2.4 Profile Likelihood Method 


As for the estimation of GEV parameters, we can generally obtain faster convergence 
of estimators using the profile likelihood method, detailed in the previous section. 
For GPD models, the profile likelihood of £ is given by 


Lp (E) = max L(o, ). 
The level 100(1 — w)% asymptotic confidence interval for € is thus defined by 


dae) 


IC; = l; : log L,(E) > log L Ê) — z 


The special case where £ = 0 is treated in [502]. 


6.4.3 Details on Hill’s Estimator for & 


Let us now take a closer look at the estimation of £. Hill’s estimator [389] is the most 
well-known estimator for this task. It is defined by 


k 


A 1 
En = k 3 log Xn—j+i,n — log Xn-k.n> 
j=l 


with X1»,..., Xn,» the order statistics associated with the sample (X1,..., X,),1.e., 
Xi n is the smallest value and X,» the largest of the full sample. Hill’s estimator is 
used often for reasons such as those listed below. 


: : ; Xj . 
e If we denote E;, the relative excess with respect to t, i.e., E;; = F with X; >t, 


it can be shown that 


f _ 
— 00 x 1/é 


P (Ej, > x|E j > 1) —> when x > 1. 
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Using the statistical likelihood based on this limit distribution, we can show that 
Hill’s estimator is none other than the maximum likelihood estimator in the case 
where t = X;,_x.» (1.e., the k-th largest threshold). 

e Furthermore, Hill’s estimator can be interpreted visually, which is of particular 
importance to practitioners. Indeed, if we look at the plot with the points 


1 
(ie 2 ’ log Xici) , 


which is called the Pareto quantile plot, then—when the distribution is Pareto— 
this plot will be approximately linear in its extreme points, with slope €. Hill’s 
estimator is then nothing more than a naive estimator of this slope, and thus of £. 


Asymptotic properties of Hill’s estimator were established, for consistency, by 
[210, 507], and for asymptotic convergence to a Gaussian by [67, 188]. 

The main disadvantage of this estimator is that it only works for positive £, and 
is therefore restrained—theoretically speaking—to the Fréchet domain of attraction. 
Various generalizations have therefore been proposed. Among these, let us highlight, 
in particular, the moments estimator from [211] and the UH estimator from [69]. 
Readers can also find in the thesis [746] goodness-of-fit test statistics for the Pareto 
distribution, in particular for the parameter &. 

Recent work on the properties of Hill’s estimator has shown that under certain 
regularity conditions, it can be written as a function of the largest order statistics of 
a sample from the exponential distribution [104]. The choice of the number of order 
statistics to use in the calculation can be determined as a function of the precision 
of the estimator of € [742]. This precision is non-asymptotic and can be calculated 
using concentration inequalities [508]. 


6.4.4 Estimating the Extremal Index 0 


When data has temporal correlation, it is necessary to estimate the extremal index 0 
measuring the strength of this correlation. 

Several estimators have been proposed for the extremal index. Ferro and Segers 
[273] have shown that Tọ, representing the duration between successive exceedances, 
converges in distribution—when the threshold corresponds to a larger and larger 
quantile—to a distribution with mass | — 6 at tf = 0 and an exponential distribution 
in its continuous part ¢ > 0. Its coefficient of variation v, defined as the ratio of its 
standard deviation to its expectation, then satisfies 


> E 2 
1+v => —— = 


110 A. Dutfoy 


If N is the number of exceedances of a threshold u by the data series and T4, . . . , Ty—1 
the inter-event times, the proposed estimator is given by 


8, (u) z min(1, 6x (u)) = max{7;:1< i <N—1}<2 (6.22) 
min(1, 0% (u)) if max{T;:1 <i < N—1}>2, 
with 
2 
54 
6,(u) = — 
Vat yea T 
and 


2 

, [Eu - 0] 
Ož (u) = TEE | | 
Ne DER = GE 2) 


Unlike many other estimators, this one has the important property of guaranteeing 
an estimation of 0 between 0 and 1. 


6.4.5 Model Validation 


Validation of an inferred model requires to use numerous informative plots. It is 
always useful—if not essential—to look carefully at the following ones. 


e QQ-plot: the empirical quantiles from the data are compared with the theoretical 
quantiles of the proposed distribution. A good fit corresponds to a straight line 
across the whole range of quantiles. 

e Probability-plot: this plot compares the empirical ranks of the data with the the- 
oretical ones from the proposed model. A good model fit again corresponds to a 
straight line. 

e Return levels graph: on this graph is plotted how changes the link between return 
periods and associated return levels, estimated using the proposed model or the 
data. 

e Density graph: this graph compares the data histogram with the density distribution 
of the estimated (inferred) model. 


An illustration of each of these plots is given in Sect.6.6.2 for an analysis of 
extreme wind speeds. 
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6.5 Study Steps 


In this section, we describe the main steps of a typical univariate extreme values. The 
aim is not to list in full the set of statistical tools available at each step, but rather to 
detail a few common methods. This description is illustrated with two examples 


e a time series of daily flow measurements of the Loire river from 01/08/1958 to 
31/12/2011, and 
e atime series of daily wind speed measurements from 01/08/1981 to 30/04/2011. 


6.5.1 Obtaining a Stationary Process 


Using modeling theorems requires a stationary process. However, especially when 
dealing with processes related to environmental variables, this hypothesis is unlikely 
to be satisfied. For example, a river flow is clearly seasonal, which leads to a non- 
stationary process over the course of a year. In general, non-stationarity in a process 
may be due to a change point (e.g., the date at which measurements began, or a 
change in the environment leading to a significant impact on the process), a trend 
(e.g., air temperature increases due to a warming planet), or seasonality. 

The presence of a trend is specifically dealt with in Chap. 8 of this book. 

As for seasonality, it suffices to restrain the study of the process to a period of the 
year in which it can be considered stationary. This period must necessarily contain 
the extreme values. Hence, for instance, in the study of rivers flooding, we retain 
only the fall and winter flow measurements. 

A precise determination of the stationary period can be made using physical and 
statistical criteria, e.g., the calculation of monthly empirical means and variances; 
the stationary period will be that in which these values are the most similar, thus 
ensuring second order stationarity. Statistical tests can also be used to help validate 
stationarity quantitatively (see Chap. 3 for more details). Particular examples are the 
uniform distribution test for threshold exceedance dates, and the Mann-Kendall test 
for detecting trends. Other tests are specifically designed to compare the likelihood 
of a model with constant parameters to one with time-dependent parameters. These 
tests are based on the distribution of the deviance D, defined by 


D=2(£; — £0), 


where £; is the value maximizing the log-likelihood of model M, and £o that of 
model Mo. The deviance can then be compared with a x(k) distribution, where k is 
the difference in the number of parameters between the two models (cf. Theorem 8.1). 
Example4.2 in Chap. 2 details the use of this test for deciding on the best model. 
For a recent look at constructing tests adapted to detecting models with variable 
parameters, see [264]. 
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Fig. 6.3 Daily river flow measurements from 01/08/1920 to 31/12/2011 (top) and a zoom on year 
1981 


Let us illustrate this step to determine the stationary period using the time series 
of river flows. Figure 6.3 shows these, along with a zoom in to the year 1981 to better 
show the annual river flow cycle. 

Figure 6.4 provides a visual representation of the monthly river flow values. The 
boxes contain 50% of the values, spread out around the median—the solid line inside 
each box. Extreme values are shown as circles. A second order stationary process 
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Fig. 6.4 Daily river flow 
measurements, boxplotted 
per month; the process can 
be considered stationary 
from December to May 
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Table 6.1 Monthly distribution of the five highest flows between 1958 and 2011 
Month J F M A M J J A S O N D 


Number |75 55 28 14 28 2 0 0 0 2 21 45 
of flows 


corresponds to constant median and variance—whose value is reflected by the height 
of the boxes. This graph suggests that the river flow can reasonably be considered a 
stationary process when restricted to the months December to May. 

It is also useful to study how extreme values of interest are spread out over the 
year by calculating, for instance, the monthly distribution of the ten most extreme 
values from each year of measurements. This can help better see which months 
extreme natural events occur in, which complements expert knowledge on the subject 
(Table 6.1). 


6.5.2 Extracting Annual Maxima and Cluster Maxima 


Once the process has been restricted to a period where it can reasonably be consid- 
ered stationary, the annual maxima subprocess—whose values are asymptotically 
distributed according to one of the three GEV distributions—can be extracted. 

In the POT approach, the analysis takes place on the cluster maxima. It is, there- 
fore, necessary to extract independent clusters from the initial data time series, i.e., 
groups of close-together values exceeding some high threshold. In this way, we 
obtain the cluster maxima subprocess. This step is called declustering and must be 
performed carefully because the rest of the study strongly depends on it. An overly- 
low choice of threshold will lead to a modeling error, while an overly-high one means 
poor parameter estimation (cf. Sect. 6.2.3.1). 
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6.5.2.1 Declustering Procedure 


For a chosen threshold u,, a practitioner has to identify independent clusters. Several 
methods exist to do so [88, 686, 737], generally based on bias-variance tradeoffs in 
estimated functions of interest [646]. However, the most common is based on the 
so-called redescent criterion [433, 434], whereby two exceedances are considered 
independent when separated by at least r values below the threshold. The value of 
r may be dictated by physical considerations. For example, for certain high-water 
events, experts suggest a redescent value of r = 10 days. See also Example5.11. 


Choosing the threshold. In practice, validation of the declustering step (choice 
of threshold u, and formation of independent clusters) is performed by checking 
the probabilistic properties stemming from the Poissonian nature of cluster maxima 
and the max-stability property of GPD distributions. Thus, the sequence of cluster 
maxima must be such that: 


e they are independent, 
e their number per year follows a Poisson distribution, and 
e the waiting time between consecutive ones follows an exponential distribution. 


Validating declusterized data. A great number of quantitative and graphical sta- 
tistical tools are available to do this—see Chap.5 for more details. In particular, 
the Kolmogorov-Smirnoy test, which quantifies the data likelihood under the null 
hypothesis Ho of a continuous distribution Fo(x) is often used to test for an expo- 
nential distribution on the waiting times between successive cluster maxima (see 
Example4.1), with the help of the empirical distribution (Fig.6.5). The equivalent 
test for discrete distributions is the x? test, which is used to test the pertinence of a 
Poisson distribution for the number of cluster per year. 
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Fig. 6.6 QQ-plot of a 
dataset simulated under a 
GEV distribution. X —axis 
are empirical quantiles while 
Y—axis are theoretical 
quantiles 
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Fig. 6.7 Modeling extreme 
flows. Selecting cluster 
maxima over the threshold 
us = 1773ms"! 
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Graphical goodness-of-fit tests for a given model are easy to interpret. The most 
well-known one is the QQ-plot (i.e., “quantile-quantile”) which plots the empirical 
quantiles from the data and compares them with the theoretical ones of the pro- 
posed distribution. A good fit between the data and the distribution would mean an 
essentially straight line across the whole range of quantiles (Fig. 6.6). 

The autocorrelation function (cf. Sect.5.4.1.1) plots the temporal empirical cor- 
relation of the data between times f and t + k for different lags k. In the example 
considered on Fig. 6.8, the confidence interval delimited by the two horizontal lines 
gives a region inside which the correlation coefficient is not statistically different to 
0. Independent data should lead to correlations found in this interval. This graphical 
test is generally used to verify independence in cluster maxima. 
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Fig. 6.8 Modeling extreme 
flows. Independence of 
cluster maxima: 
autocorrelation function 


Fig. 6.9 Modeling extreme 
flows. Waiting times 
between successive cluster 
maxima (QQ-plot for the 
exponential distribution) 


6.5.2.2 Practical Example 
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Flow data. We now illustrate the types of results that can be obtained using the flow 
time series shown in Fig. 6.3. Figure 6.7 plots the maxima of the clusters retained. 
Results on Fig. 6.8 thus allow us to test for cluster maxima independence using the 
autocorrelation function; the estimated correlations between data at times t and t + k 
for several lags k are all indistinguishable from 0 except one (k = 6). It is, therefore, 
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Fig. 6.10 Modeling 
extreme flows. Annual 
number of cluster maxima 
(empirical distribution and 
estimated Poisson 
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reasonable to suppose that the hypothesis of independent cluster maxima is plausible. 
The QQ-plot, which allows us to test whether the waiting times between successive 
cluster maxima follow an exponential distribution, does not invalidate the above 
choice (Fig. 6.9): the plot clearly shows a good fit between the empirical quantiles 
of the data and those from the proposed exponential distribution. 

As well as this, the Poisson distribution for the annual number of cluster can also 
be validated: Fig. 6.10 plots the observed values as well as those from the estimated 
distribution. The test for a uniform distribution of the dates at which cluster maxima 
occur (Fig. 6.11) also appears to show that this hypothesis is acceptable. 
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Fig. 6.12 Daily wind speed 
measurements from 
01/08/1981 to 30/04/2011 
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Wind speed data. We illustrate now the stability properties of GPD distribution 
parameters on the wind speed data. This time series is shown in Fig. 6.12 and covers 
the period 1981-2011. The data has been analyzed in the same way the flow data 
was. We concentrate on other aspects of the methodology here; notably, we focus on 
the choosing the threshold above which exceedances will be taken into account. 

First, we illustrate the stability property of GPD distribution parameters given in 
Proposition 6.3. Figure 6.13 plots the scale o* (u) and shape £ parameters as a function 
of u from 29 m.s_! to 31 m.s !. The stability of the curves implies that u, = 29 m.s~! 
is an acceptable choice. Next, Fig. 6.14 plots the conditional expectation of exceeding 
u foru > 29 m.s !. A certain amount of linearity can be seen starting from 29 m.s~!, 
which allows us to validate the choice of us, according to Proposition 6.4. Figure 6.15 
plots the cluster maxima retained above u, = 29 m.s—!, 


6.6 Fitting a GPD Model to Wind Speed Data 


Remember that any estimate must be accompanied by a confidence interval allow- 
ing us to quantify the potential statistical error due to having a limited number of 
data (see Sect.4.13). However, be careful: such intervals are based on the asymp- 
totic distributions of estimators, which supposes that the number of data used in the 
estimation is relatively large. 

The study of the wind speed time series is summarized in Table 6.2, giving the 
estimated parameters for the GPD distribution of cluster maxima exceedances of the 
threshold u, = 29m.s~!, along with the 90% confidence intervals, noted [ao, boo]. 
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Fig. 6.13 Modeling extreme wind speeds. Stability of the extremal index and parameters of the 
generalized Pareto distribution 


6.6.1 Estimating the Extremal Index 0 


We estimate 0 with the help of the estimator proposed by [273]—see Eq. 6.22. The 
relative stability in the estimation of 6* (u) seen in the first plot in Fig.6.13 for 
thresholds around 29 m.s~!, validates the value 6 ~ 0.63 for the wind speed data. 
Cluster size is therefore estimated at 1/0 ~ 1.6 days, i.e., less than two days. 


6.6.2 Validating the Inferred Model 


To validate the choice of model, we look at the probability-plots and return level and 
density graphs, whose definitions were given earlier. 
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Fig. 6.14 Modeling 
extreme wind speeds. 
Evolution of the conditional 
expectation of exceeding the 
threshold u by independent 
cluster maxima above the 
value vj = 29m.s~! 
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Table 6.2 Fitting a generalized Pareto distribution to independent cluster maxima exceeding the 
threshold us = 29m.s~! for the wind speed data 


Parameter Estimation St. dev. dop boo 
E —0.315 0.0879 —0.488 —0.143 
o 2.546 0.334 1.892 3.201 


For illustration’s sake, the graphs for the wind speed analysis are given on 
Fig. 6.16. The quite linear form of the first two, and the good fit to the data seen 
in the others, means that the model selection and inference performed is credible. 


6.6.3 Estimating Return Periods and Annual Frequencies 


The distribution of cluster maxima allows us to determine the levels of T -year return 
periods using (6.7). These estimates comes with confidence intervals denoted IC, 
based on the asymptotic convergence result of the maximum likelihood estimator 
recalled in Eq. 6.8. 

For the wind speed time series, we find the following centennial level, along with 
its 95% confidence interval: v100—36.6 m.s~!, and ICosq, = [34.6 m.s~!; 38.5 m.s~!]. 

The annual frequencies as defined earlier and calculated according to formulas 
(6.9) for the event of exceeding the centennial level are the following: 


e the frequency defined by the mean yearly number of days exceeding v100: 
FOB (E) = 1.59 x 10-7, 
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Fig. 6.15 Modeling 
extreme wind speeds. 
Selecting cluster maxima 
above the threshold 

us = 29m.s7! 
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e the frequency defined by the mean yearly number of episodes in which v100 is 
exceeded: F!4(&) = 1072, and 

e the frequency defined by the probability of having at least one episode in which 
V100 is exceeded in a given year: Re (£) = 9.95 x 1073. 


6.7 Conclusions and Perspectives 


This chapter has presented the main results in classical univariate extreme value 
theory, illustrated with recent studies undertaken by engineers and researchers at 
EDF. We have emphasized the “engineering” side of things and the data preprocessing 
required to access the theoretical framework for stationary processes. Such analyses 
can also be found in tutorials made freely available by engineers and researchers (see 
for example [239]). 

The theory is, however, much broader that what we have shown, and contains 
numerous other results. In particular, the study of the colored Poisson process, where 
the color is the cluster size, makes it possible to estimate the distribution of cluster 
lengths. This model gives an idea of the severity of the natural hazard, quantified by 
the duration of above-threshold episodes. Moreover, the study of specific stationary 
processes such as Markov chains, where the current instant depends on only a finite 
number of past instances, is also of interest. Indeed, this type of modeling is frequently 
used for variables related to the environment (e.g., autoregressive processes such as 
ARMA for studying how the temperature evolves). 

Even though it is a well-studied setting, univariate extreme value theory is still a 
subject of active research, notably in the following areas—the importance of which 
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Fig. 6.16 Modeling extreme wind speed data. Graphical validation of the modeling of cluster 
maxima exceedances: a probability-plot, b QQ-plot, c empirical and model density graph, d return 
levels graph 


is clear from the current chapter: estimating the shape parameter £, estimating the 
extremal index 6, declustering methods [288, 369]), etc. If interested, readers can find 
in Chap. 12 more information on this subject. The study of non-stationary processes 
is also an active domain, especially the extension of the point processes to this setting 
[402, 403]. 

The shortage of data is a major problem which strongly impacts the precision of 
model inference. To overcome this, a variety of methods can be tried. One possibility 
is the regional approach presented in Chap.7. This method permits us to combine 
data coming from different locations using a data homogenization approach. This 
strongly increases the quantity of data available and improves the convergence of 
estimators and models. 
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An alternative method is to take potentially very old historical data into account. 
Such data holds precious information on extreme events which are not part of a 
systematic time series, and can be used in model inference. However, the best way 
to include it is not always obvious. A Bayesian approach (Chap. 11) is a possible 
solution to this. 

Yet another strategy is to artificially increase the number of data by simulating it 
using a coded model of the physical processes involved. The challenge is, therefore, 
to be able to model the physics well and produce simulated data that looks like it 
comes from the true process, including its extreme values, both in terms of actual 
values and frequency. The second part of Chap. 10 provides some details on this. 

To conclude, we now arrive at the question asked by all extreme values analysts: 
“Up to what point is it reasonable to extrapolate an extreme value distribution”? 
In other words, is it reasonable to calculate—starting from a GPD or GEV model 
estimated with the help of n data points—the level of any return period? 

The answer to this question is not immediately obvious. Clearly, it depends on 
the level of precision underlying the reasoning used, i.e., the precision in estimated 
return levels, generally in terms of the length of confidence intervals (or credibility 
intervals if this notion is correctly interpreted—see Chap. 11). 

It also depends on the quantity of data, its quality, the rate of convergence (e.g., 
of the marginal distribution—a priori unknown—of the process) to asymptotic dis- 
tributions, etc. A number of research projects including [29], supervised by EDF 
engineers and researchers, are underway on this subject. Their aim is to improve 
the credibility of extrapolations of extreme value distributions, and in the long run 
reinforce certain demonstrations of robustness of industrial facilities. 


Part II 
Elements of Extensive Statistical Analysis 


Chapter 7 (M) 
Regional Extreme Value Analysis ne 


Jérôme Weiss and Marc Andreewsky 


Abstract This chapter focuses on the occurrence of extreme natural hazards in non- 
instrumented locations, whose neighborhood contains sites for which measured data 
are available. The spatial regionalization approach developed here, which is based 
on the detection of regions that are homogeneous in terms of risk, allows to propose 
a localized quantification of these events. 


7.1 Introduction 


Recall that extreme value theory provides a theoretical and rational framework to 
estimate the intensity of events like significant wave heights with a return period 
of 100 years (or 1 000 or even 10 000 years) at a site of interest. In environmental 
engineering, these return levels are usually estimated using local statistical analyses, 
whose principle is to apply extreme value theory to only observations from the 
site of interest. A direct application of this theory can sometimes pose difficulties, 
because the length of time over which observations have been made at a given site 
is generally low (e.g., around 10-30 years in the case of sea-based data). In such 
situations, uncertainty in extrapolations associated with long return periods is often 
far too high to assess risk in a meaningful quantitative way, i.e., to provide real 
decision support. The aim of this chapter is therefore to examine the potential of 
regional analysis to obtain less uncertain estimates of extreme levels. 

Indeed, regional approaches [196] have been introduced with the aim of taking 
advantage of information available in several time series (or at several sites) near to the 
site of interest, so as to help forecast return levels of extreme hazards at a local level. 
This type of approach is of particular interest today since increasingly large spatial 
databases have become available, providing project managers with large amounts of 
useful information. This chapter examines the possibility of taking advantage of a 
regional enlargement of the extreme hazard zone of interest to: (a) group together 
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more extreme events and thus obtain larger sample sizes; and: (b) build a predictive 
field at specific sites. 


7.1.1 General Principle 


Regional approaches are therefore based on the rather natural assumption that at sim- 
ilar sites—labeled “homogeneous”—characterized by the same natural phenomena, 
the occurrence probabilities of extreme events are similar. In such cases, all collected 
information can be used to estimate so-called “regional” probability distributions, 
leading to more certainty in estimations. Logically, more robust estimates of return 
levels could be derived. 

More technically, the statistical concept of regionalization is based on the fol- 
lowing hypothesis: homogeneity of a region, for the type of event being considered, 
results in equality between the distributions of the frequency of maximum amplitude 
events (over a fixed period, typically a year) at each measuring station (site), up to 
some scale factor. This factor embodies, or rather summarizes, the precise features 
of each site. In the case of river flow data, the scale factor is defined as the index 
flood. In the motivating case of marine data considered in this chapter (wave height, 
storm surge), it is called the sea index. 


7.1.2 Historical Setting 


The term index flood actually comes from the application domain where regional 
studies were originally conducted: hydrology. The regional analysis methods pre- 
sented here have a more general scope, though they essentially originate in the 
approach introduced by Dalrymple [196] to estimate the occurrence frequency of 
floods using an index flood model. Stedinger et al. [720] subsequently showed that 
the index flood model, using information at the regional scale, effectively reduces 
uncertainty in return level estimates relative to local analyses. 

Since the work of Dalrymple [196], regional analysis has become popular in 
hydrology and meteorology [397]. For example, the use of this type of model in the 
estimation of extreme flows in watercourses can be found in the works of Cunnane 
[189], De Michele et al. [205], Javelle et al. [410], Madsen et al. [494], Merz et al. 
[512] and Stedinger [719]. The modeling of extreme rainfall has been studied by 
Borga et al. [101], Hosking et al. [397] and Schaefer [687], and extreme winds by 
Escalante-Sandoval [256], Goel et al. [344] and Sotillo et al. [714]. Note also that 
UK regulations for estimating extreme rainfall recommend the use of such regional 
models [3]. 

As for maritime applications, Van Gelder et al. [761] have used regional methods 
to estimate extreme sea levels at 13 sites located on the Dutch North Sea coast. 
Extreme high sea levels were modeled at 9 sites in the North Sea by [760] and 
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at 11 sites on the eastern coast of the Sea of Japan by [342, 343]. Extreme storm 
surges out to sea have been modeled by Bernardara et al. [86] at 18 sites on the 
French Atlantic coast and the English Channel, and by Bardet et al. [57] at 21 sites in 
the same area. In addition, Hosking [395] considered tsunami run-ups! at 114 sites 
in the Pacific Ocean. A common characteristic of these studies is a homogeneity of 
maritime hazards over rather broad areas. Note also that the regulatory framework for 
estimating storm surges in France recommends statistical studies at local or regional 
levels [15]. 

Lastly, recent research results [780], conducted at EDF and applied to wave, 
storm surge, rainfall, and wind data, are able to take into account the spatial dimen- 
sion of phenomena. In particular, spatio-temporal declustering* algorithms specific 
to regional analyses have been proposed, including techniques that deal with the 
correlation between extreme values from neighboring sites. Also introduced was the 
concept of regional return period of an event, in contrast to the local return period 
of an extreme value. 


7.1.3 Principle of Regional Analysis 


The regional analyses proposed in this chapter are based on a POT approach to 
extreme values, and thus on modeling with a GPD distribution. Indeed, this generally 
allows better use of information available on extreme events [433]. If, for example, 
several particularly intense events occurred in the same year, then the annual maxima 
method cannot really take advantage of this information. Conversely, in a year with 
no significant events, including the annual maximum when estimating extreme events 
may skew the results. In both cases, the POT approach avoids these pitfalls (although, 
as mentioned in the previous chapters, it is still affected by correlation problems in 
the data). By comparing the performance of these two methods on extreme value 
estimates, [492] has shown the superiority of the POT approach for regional analyses. 


7.1.3.1 Modeling 


The following modeling is directly derived from the previously introduced index 
flood model. Consider an area defined by N sites of interest, each with a measur- 
ing station. For i € {1,..., N}, the random variable X; denotes an extreme hazard 
characterizing site 7, with cumulative distribution function Fx,. The hypothesis of 
regional homogeneity supposes that the probability distribution of extreme values is 
the same at each site, up to normalization with a local index jz; at each site i. This is 
written: 


' The run-up is the maximal amplitude of the tsunami its contact with the coast, i.e., the height of 
the wave above the mean high tide level. 


130 J. Weiss and M. Andreewsky 


Xi 
—~Ywo F, (7.1) 
Hi 


where Y is the so-called regionalized random variable for the hazard, with cumulative 
distribution function F,. Thus, Vi € {1,..., N}, Vy € IR, 


Fx, (miy) = F, O). (7.2) 
In terms of return levels, Formula (7.1) is equivalent to 
XxX; = Hiyo, (7.3) 


where 


e xP denotes the return level with period T at site i, 
e y is the regional return level with period T. 


The definition of the local index u; depends on whether we choose to model the 
actually values X; recorded over a threshold u; (or exceedances), written 


Xi = S;|Si = ui, 
or the threshold deviations instead: 
Zi = S; — uj |S; = ui. 


Though it is equivalent to work with the exceedances X; or the threshold deviations 
Z; in local analyses, this is no longer the case for regional ones. Indeed, the choice 
of one or the other has an impact on both the structure of the local index and the 
regional distribution. 


The exceedance-based approach. In this framework, extreme values at site i are 
those which exceed a threshold u;. Extreme value theory recommends modeling 
these using a generalized Pareto distribution: 


Wie {1,..., N}, Xi ~ Gpp(ui, oi, &). 
The regional homogeneity hypothesis (7.2) gives 
Vx > —, p (= <x) = P ($S; < wix|S; = ui), 
= F,(x). 
Following [669], and noting that Vi, F (u;/ui) = 0 and F,(u;/u; +£) > 0Ve > 0, 


we have that the ratio u; / u; is the lower bound of the support of the distribution F,. 
In particular, as this bound should not depend on i, the local index ju; is constrained 
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to be proportional to the threshold u;. Without loss of generality, we can therefore 
choose the local index u; to be equal to the threshold: 


Ui = ui. (7.4) 
Thus, under the regional homogeneity hypothesis, 
Y ~ Groll, y, E), 
where the scale and shape parameters (y, &) satisfy 
y=—, 
& = £. 
Renormalizing by the local index, this implies that 
Xi ~ Fpplui, yui, Ẹ). (7.5) 


In particular, the shape parameter of the generalized Pareto distribution is constant 
for all sites of the homogeneous region, and the threshold u; alone contains all local 
details for site i. Ways to determine this threshold will be discussed in Sect. 7.2. 


The threshold-deviation-based approach. For the variable Z;, the local index can 
take any form. In effect, the regional homogeneity hypothesis becomes 


Zi 
Vx > 0, (2 <2) = P ($; < ui + wix|S; = ui), 
Hi 
= F,(x). 


There are now no constraints on the value of u; since the support JR* of the variable 
Zi / ui does not depend oni now. Thus, giving values to the local index is a (subjective) 
choice to be made. For instance, numerous studies define an empirical local index 
as the mean or median of the observed values of Z;. The mean seems usually a 
reasonable choice that may avoid potential regional heterogeneity and the effects of 
the dependence between the sites [780, 781]. 

If we make the classical choice that u; = E[Z;] = y;/(1 — &;), the regional homo- 
geneity hypothesis becomes, in terms of distributions, 


Y ~ Fro, 1— £, £). 
The only regional parameter is the shape &, shared by all sites. This fairly rigid 


regional distribution (even more so in the Gumbel/exponential setting with £ = 0) 
is a real problem. Indeed, a good level of flexibility of the regional distribution is 
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wished for representing correctly the most extreme values, and incorporating the 
effect of external covariates. 


Therefore, it is generally better to adopt the threshold-exceeding-based approach 


rather than the second approach; doing so provides two regional parameters rather 
than one and avoids a subjective assessment. 


7.1.4 Main Steps of Regional Analyses 


Regional analysis, deeply studied in [780], involves the following steps. 


1. 


. Calculating return levels xí 


Sampling extreme values X; at various sites. In local analyses, temporal criteria 
are typically used to transform observed extreme values into independent events. 
However, in a regional approach, sampling extreme values require not only taking 
into account the temporal flow of events generating extreme values, but also 
their spatial structure. Section 7.2 focuses on this question and provides a method 
guaranteeing spatio-temporal independence of extreme values occurring in the 
study area. 

Grouping together sites into homogeneous regions. How best to divide up the 
study area into homogeneous regions is still an open question. A method for 
finding a good partition, proposed in [780], involves identifying typical storm 
footprints generating extreme measurements and is detailed in Sect. 7.3. 
Grouping together observed values x; normalized by the local index 4; in the 
regional sample (this is called pooling) using Eq. (7.1). 

Calculating the effective regional duration. After grouping together the extreme 
values from different sites in the same region as a new sample, we obtain a new 
observed time series over an “effective” regional time period. The value of this 
effective duration is a function of the level of spatial dependency found in the 
region. Section 7.3.3 deals with taking into account spatial dependency in order 
to deduce various quantities, such as the effective duration or return period of a 
storm. 

Estimating the regional distribution Gp p(1, y, €) using the regional sample and 
standard parameter estimation techniques. 

Calculating the regional return level yr after extrapolating the previously esti- 
mated regional distribution. 

D at each site after normalizing (7.3) by the local 
index. 

Comparing performance with local analyses. Section7.5 shows a comparison of 
local and regional analyses in terms of uncertainty in return level estimates, as 
well as in terms of capacity to model the occurrence of supposed outliers*. 
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7.2 Sampling Extreme Values 


In this section, we present a sampling procedure for extreme values which can be used 
for performing regional analyses. In the classical time series framework, the declus- 
tering principle is based on retaining a unique maximum from a dependent group 
of values exceeding a threshold. The idea is to extract from a series of exceedances 
(POT approach) a subset of supposedly independent terms, so as to respect the theo- 
retical underpinning of extreme value statistics (see Definition 6.4 and Theorem 6.2). 
In the spatio-temporal setting for regional analyses, this sampling procedure must 
be modified. 

The sampling method we present can be seen as a regional extension of the tra- 
ditional POT approach (at any given site). However, in the POT context, Bernardara 
et al. [87] recommend using a “double threshold” approach when processing environ- 
mental data, which is used in the remaining of this chapter. The idea is to separate the 
uncertainty on the physical mechanisms of a phenomenon from a purely statistical 
analysis of the extreme values: 


(i) identify independent events, starting from an initial physical (“natural process”) 
threshold, where the variable being studied is largely out of its typical range; 

(ii) with these events, determine a second “statistical” threshold to use in estimating 
extreme values. 


One of the notable advantages of this “double threshold” approach is the possi- 
bility of considering high statistical thresholds without altering the physically based 
part of extreme events, the first threshold having already identified it. 

Though this is a general approach, it is easily understandable with the help of an 
example. We have therefore illustrated it throughout the text using a case study on 
marine storms (significant wave heights). 


7.2.1 Spatio-Temporal Declustering 


7.2.1.1 Constructing Physically Plausible Samples 


In order to build physically plausible samples within the regional framework of 
a “double threshold” POT approach, we first define storm objects. The aim is to 
trace the spatio-temporal dynamics of environmental extreme values generated by 
large-scale meteorological phenomena, solely from available recorded data. 

Here, storms are directly characterized in terms of a variable of interest (for 
example, significant wave heights or wind speed). In particular, we define them as 
meteorological phenomena generating extreme events (maritime in this case) in at 
least one site in the study area. The procedure used to detect them amounts to defining 
them as coherent spatio-temporal overshoots of a high-valued quantile. A similar 
approach for detecting storms from wind speed measurements has been proposed by 
[469, 561]. 
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Let us note q, the order p quantile (where p is close to 1) of the original observed 
time series at a given site. We suppose that a storm has hit the site if q, is exceeded. 
The values q, represent the physical thresholds defined in the “double threshold” 
POT approach. Storms propagate in space and time, and detecting them depends ona 
spatio-temporal declustering. The main idea is to suppose that neighboring extreme 
values in space and time were generated by the same storm. More precisely, two 
extreme values are spatio-temporal neighbors if the following conditions are both 
satisfied: 


e the events happened less than A hours apart, 
e the events were at sites within the 7-nearest neighbors of each other. 


Thus, three parameters are required to detect a storm: p (defining the impact), and 
(A, 7) related to spatio-temporal propagation. A more detailed description of these 
can be found in [782]. 


e The choice of A, related to the propagation time of extreme values between neigh- 
boring sites, notably allows us to detect successive storms impacting the same 
region (as for example the storms Martin and Lothar, which occurred between 
26 and 28 December 1999, or the series of storms which hit Brittany between 
December 2013 and March 2014), 

e The choice of 7, which depends in part on the distance between sites, allows us to 
distinguish between different storms leading to extreme values at different places. 


The parameters (p, A, 7) need to be chosen so as to best represent the dynamics of 
typical storms. For example, if p is too large and A (or 77) too small, it is likely that 
the same extreme natural event will be incorrectly divided into two or more events. 
Conversely, if p is too small and A (or 7) too large, two distinct events may be 
incorrectly combined as a single storm. 

Furthermore, the values chosen for (p, A, 7) may be quite different, depending 
on the actual variable being studied. Factors influencing the choice of values include 
propagation dynamics to other sites, spatio-temporal resolution of available data, and 
the presence of missing data. It is therefore recommended to perform a sensitivity 
study when choosing (p, A, 7) to ensure, for example, that the dynamics of the most 
extreme events are correctly reproduced. Due to how they have been constructed, 
identified storms are considered independent of each other, both from spatial and 
temporal points of view, making the choice of parameters even more important. This 
step corresponds to identifying physically plausible storms. 


7.2.1.2 Constructing Statistical Samples 


The second step of the “double threshold” approach consists in building statistical 
samples which will be used only to examine statistical aspects of the extreme values. 
This involves looking at the strongest recorded storms; at each site, a new threshold 
u greater than the physical threshold q, is then determined. 
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The value of u can be chosen so that À storms per year occur on average at 
each site. From an annual point of view, at a site 7, the statistical threshold u; thus 
corresponds to on average À storms. The value of À needs to be determined in terms 
of the application in mind and is often the result of a bias-variance trade-off for 
best-estimating extreme quantiles. 

Storms are then re-defined from the statistical point of view. If a site i was impacted 
by a storm, the storm is retained if and only if u; was exceeded. This step corresponds 
to the identification of “statistical” storms. 


7.2.2 Application: Marine Storms 


7.2.2.1 Available Data 


Throughout this chapter, various features of regional analysis methods are illustrated 
on an application involving extremely high waves. These measurements come from 
the ANEMOC-2 marine database described in [744] and constructed using retro- 
spective simulation over the Atlantic ocean between 1979 and 2010. Significant 
wave heights were extracted for 1847 sites (Fig.7.1), with a spatial resolution of 
approximately 120km in the North Atlantic, 20km on the European coast, with a 
more precise 10 km on the French coast. For each of the 1847 sites, the hourly time 
series of significantly high waves were extracted for the period 1979-2010. 


7.2.2.2 Extracting Physical Storms 


Here, marine storms are therefore described as natural events generating significant 
wave heights in at least one of the 1847 sites in Fig. 7.1. After various tests, the values 


p= 0.995, A=2h, n= 10 


appear to provide a satisfactory representation of the natural processes behind storms, 
where (A, n) are connected respectively to the propagation of extreme waves between 
neighboring sites and the grid’s density in Fig. 7.1. 

This set of parameters leads to the extraction of 5939 marine storms. The thresh- 
olds go.995 for detecting storms (99.5% quantiles of the hourly significant wave 
heights) are shown in Fig. 7.2. The lowest quantiles (located near the English Chan- 
nel and North Sea coasts) are between 2 and 6m, while the largest overall are around 
14m. 

To illustrate the procedure for extracting marine storms with the chosen parame- 
ters, the propagation of the storm Daria from 24—26 January 1990 is shown in Fig. 7.3. 
Figure 7.5 shows several of the most well-known marine storms that occurred during 
the study period, in addition to Daria. In particular, the storms Lothar and Martin, 
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Fig. 7.1 Locations of the 1847 sites extracted from ANEMOC-2. Figure taken from [782] with 


editorial permission 
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Fig. 7.2 Detection thresholds for marine storms: 0.995 quantiles for wave height time series (m). 
Figure taken from [782] with editorial permission 
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Fig. 7.3 Propagation of the storm Daria (24-26 January 1990) at 5h intervals. The red points 
indicate affected sites (exceeding the 0.995 quantiles of the wave height time series) and the gray 
points the full footprint of the storm. Figure taken from [782] with editorial permission 


which occurred in the same region and less than 36h apart, are correctly separated 
into two distinct events (Fig. 7.4). 

In addition, the exploratory study of 5939 marine storms extracted over the period 
1979-2010 shows that: 


on average, 192 storms occurred per year in the study region, 

on average, each storm affected 38 of the 1847 sites, 

on average, storms lasted 12.5h at any given site, 

storms were much more frequent in winter than in summer: 89% occurred from 
October to March, with 20% in the month of January alone. 


7.2.2.3 Extracting Statistical Storms 


Here as well, after various tests, the value À = 1 was selected to extract statistical 
storms from among the physically plausible ones. This determined the statistical 
thresholds u;, all of which are greater than the physical ones qo.995, as well as the 
local indices given in Eq. (7.4). These thresholds are shown in Fig. 7.4, with values 
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Fig. 7.4 Marine storms: statistical thresholds for wave heights (exceeded on average À = 1 times 
per year) in meters. Figure taken from [783] with editorial permission 


ranging from 3.26 to 16.84 m. From the 5939 physical storms, 1340 statistical storms 
were identified. 


7.3 Selecting Homogeneous Regions 


Equation (7.1) supposes that sites can be grouped into homogeneous regions, in 
each of which hazards at each site behave statistically similarly. This grouping of 
homogeneous regions must answer above all to a physical reality that validates this 
fundamental hypothesis of homogeneity. For instance, which zones share conditions 
(covariates) that can explain the extreme phenomenon? Does this physical hypoth- 
esis correspond to statistical homogeneity? 


7.3.1 Identifying Physical Homogeneity 


The identification of potentially homogeneous regions may be based on a physical 
criterion involving the storms extracted in the previous section. In particular, regions 
can be defined as the typical footprints of storms generating extreme events, which 


7 Regional Extreme Value Analysis 139 


Fig. 7.5 Examples of marine storm footprints. a 15—16 October 1987, b New Year’s Day Storm 
(31 December 1991—1 January 1992), c Lothar (26 December 1999), d Martin (27 December 
1999), e Kyrill (17-23 January 2007), f Johanna (10 mars 2008), g Klaus (23-24 January 2009) 
and h Quentin (8-13 February 2009). Points represent affected sites 


has the advantage of giving a physical interpretation to the results: sites in a given 
region will tend to be impacted by the same storm. It is therefore expected that 
extreme values observed in a region, coming from the same natural events, will 
behave similarly from a statistical point of view, and thus satisfy the homogeneity 
hypothesis. We now summarize the method, which is described in more detail in 
[782]. 


7.3.1.1 Detecting Typical Storm Footprints 


The aim is to partition the set of sites so that each subset corresponds to a typical storm 
impact region. The use of classification procedures based on a storm propagation 
criterion is one way to achieve this. 

For any two sites i and j in the study region, let p;; be the probability that extreme 
values are produced during a storm at both sites, given that either i or j is affected. 
These probabilities can be empirically estimated for all pairs of sites (i, j), given 
a sample of previously-extracted storms. The storm propagation criterion is then 
defined as 


dij = 1— py (7.6) 


If dj; = 0, then any storm impacting one of the sites also impacts the other. Con- 
versely, if d;; = 1, storms hitting one site never hit the other. 
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The next step involves grouping together sites into R disjoint regions, where the 
distance between sites is defined directly in terms of the storm propagation criterion 
(7.6). The resulting partition can then be seen as a set of storm footprints. In particular, 
ascending hierarchical clustering using Ward’s method (or one of its extensions 
[731]), based on maximizing inter-class variance, can be used. The resulting hierarchy 
of regions can be shown as a dendrogram [122] (see Fig.7.7) in which heights are 
defined in terms of inter-class distances which depend on the distance criterion (7.6) 
chosen. 


Ascending hierarchical clustering. 


1. Each site is initially considered a region to itself. 

2. The “closest” regions to each other in terms of criterion (7.6) are combined 
to make a new region. 

3. Stop when only one region remains (the whole study region). 


Given the range of storm footprint configurations obtained from the dendrogram 
as R changes, the goal is to identify the “best” one. The optimal number of regions 
may be determined using some measure of clustering quality. A good partition should 
contain well-separated regions, with sites similar to each other found in each region. 
In particular, a significant jump in height as R changes indicates that two dissimilar 
sites have been joined together. Visual detection of this kind of jump can help choose 
the final number of regions to keep, as well as identify the most representative storm 
footprints. Sites grouped together as a region will tend to be impacted by the same 
storms (though not always), and a storm impacting a certain region will tend to stay 
within that region. 


7.3.1.2 Application: Marine Storms 


From the 5939 marine storms extracted from the period 1979-2010, the storm prop- 
agation criterion (7.6) was calculated for each pair of sites out of the 1847. Hier- 
archical clustering using Ward’s method provides us with several possible storm 
footprint configurations as a function of the number of regions asked for (Fig. 7.6). 
It is interesting to note that a form of geographic contiguity between regrouped sites 
is obtained without having to fiddle with the algorithm in any way. 

The dendrogram is shown in Fig. 7.7a. Its sequence of heights (Fig. 7.7b) suggests 
we cut the tree at R = 5, giving a subdivision of the study area into five smaller 
regions, corresponding to the most common storm footprints. These supposedly 
homogeneous regions are shown in Fig. 7.8: the South Atlantic (Region 1, 399 sites), 
North Atlantic (Region 2, 497 sites), North Sea (Region 3, 241 sites), English Channel 
and its approaches (Region 4, 392 sites), and the Bay of Biscay (Region 5, 336 sites). 
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Fig.7.7 a Dendrogram. b Evolution of dendrogram heights as a function of the number of regions. 


Figure taken from [782] with editorial permission 
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Fig. 7.8 Marine storms: 
partition of the study area 
into 5 regions corresponding 
to representative storm 
footprints. Figure taken from 
[782] with editorial 
permission 
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7.3.2 Testing for Statistical Homogeneity 


The previous section showed how to construct regions that are homogeneous from a 
physical point of view; it only remains to now test their statistical homogeneity, 1.e., 
whether the distribution of extreme values normalized by the local index is the same 
across the whole region. 


7.3.2.1 Methods 


One of the most often-used methods for testing a region’s statistical homogeneity 
was proposed by Hosking et al. [396]. It is based on the principle that the ratios of 
L—moments characterizing the probability distributions should in theory be identical 
at each site in a homogeneous region. 


Definition 7.1 (L—moment) Let X be a random variable. Its L—moment of order 
r € IN* is defined by 


r—1 
Àr = ji DO (G 7 ) E [Xv—w:r | ; 
k=0 


where Xx: is the k" smallest order statistic of an independent r-sample of values of X. 


According to [396], problematic sites within a region can be first detected using 
a discordance criterion D, which measures whether a given site is significantly 
different to the others in the same region in terms of L—moments. At site i, the 
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criterion is defined by 
>. = 1 — Hy) T =l — Ay) 
Di = 3 (@; — ©) S~ (@; — ©), 


N N 

with © = N! © œ; and S = Ð (œi — ©)! (œi — ©), where wi = (Ti, ti3, T4) is 
i=l i=l 

the vector of the first three empirical L—moment ratios observed at site i: 


A2 
T= —, 
À: 
À3 
B=, 
= 
À4 
T4 = —. 
oat 


In particular, a site i is considered discordant if D; > 3 for regions with more than 
15 sites. If 4 < N < 15, a table of critical values for D; is provided in [396]. 


Following this, a region’s degree of statistical homogeneity can be evaluated using 
a heterogeneity criterion H, which indicates whether the dispersion of L—moments 
observed across sites is comparable to that which would be expected in a statistically 
homogeneous region. This criterion is defined by 


V — 
po v 
OV 


where (V, uy, oy ) corresponds to the empirical dispersion, mean, and standard devi- 
ation of the coefficients of variation (called L—CV) t; at site i. As the regional L—CV 
is defined by 


where n; is the sample size at site 7, the empirical dispersion is given by 


N 
Donli — t)? 
i=l 


V= 
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= 
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The test itself (called the Hosking—Wallis test) is based on the fact that under the 
homogeneity hypothesis, the random variable H follows a 4-parameter Kappa dis- 
tribution, whose cumulative distribution function is given by 


= y I/k l/h 
has pM 
Fœ) = i at k— | | 


where jz is a location parameter, œ a scale parameter, and (k, h) shape parameters. 
This distribution, which contains the GEV (h = 0) and GPD (h = 1) ones as spe- 
cial cases, is sufficiently flexible to approximate most types of variables seen in 
environmental science [782]. According to the test, a region is considered homoge- 
neous if H < 1, possibly heterogeneous if 1 < H < 2, and definitely heterogeneous 
otherwise. 

Thus, for each of the previously identified representative storm footprints, the 
following procedure is used, leading to regions that are homogeneous physically and 
statistically. 


Homogeneous clustering procedure. 

(a) Calculate the heterogeneity criterion H.If H < 2, go to (d), otherwise go 
to (b). 

(b) Calculate the discordance criterion at each site. 


e If no site is discordant, go to (c); 
e otherwise, remove discordant sites and again calculate the heterogeneity 
criterion Ho. 


— If Ap < 2, go to (d); 
— otherwise, go to (c). 


(c) Divide the region into two new sub-regions according to the hierarchical 
clustering dendrogram calculated previously. For each sub-region, go to 
(a). 

(d) The region is homogeneous both statistically and physically. 


7.3.2.2 Application to Marine Storms 


We first calculate the heterogeneity criterion H for each of the 5 regions. According to 
Table 7.1, Regions 1, 2 and 3 can be considered statistically homogeneous. However, 
statistical heterogeneity is present in Regions 4 and 5, so these need to be re-defined 
before proceeding with a regional analysis. 

Looking closer at Region 4, we find that 15 of the 392 sites are discordant. Once 
these have been removed, the region becomes statistically homogeneous (H = 0.68). 
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Table 7.1 Marine storms: the heterogeneity criterion H for each of the five regions 


Region 1 Region 2 Region 3 Region 4 Region 5 


1.60 —0.07 —2.36 4.22 7.36 


Fig. 7.9 Marine storms: 
dividing the study area into 
six homogeneous regions 
(both in the physical and 
Statistical sense). Figure 
taken from [782] with 
editorial permission 


The removed sites—which do not form a homogeneous region themselves—are 
found close to the coast, where local effects can significantly influence extreme wave 
heights. Thus, the English Channel region is restricted to the 377 non-discordant sites. 

As for Region 5, removing the discordant sites does not improve its statistical 
homogeneity. We, therefore, divide it into two storm footprints with the help of 
the hierarchical clustering shown in the dendrogram: northern Bay of Biscay (H = 
—0.45, 234 sites) and southern Bay of Biscay (H = —4.69, 102 sites). These two 
regions are now statistically homogeneous. In particular, strong statistical asymmetry 
can be seen in the northern compared to the southern region. 

Thus, we end up with six homogeneous (both in the physical and statistical sense) 
regions, with which we can now estimate extreme value quantiles using regional 
analysis. The regions are shown in Fig. 7.9. 


7.3.3 Accounting for Spatial Dependency 


Spatial dependency can be modeled by the choice of a correlation structure within 
a statistical model. This choice can be guided by the use of non-parametric tools 
like correlograms, which are autocorrelation diagrams; see Sect. 5.4.1.1. Here, we 
build on the work in [783], where a model for characterizing spatial dependency 
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between storms was proposed. Based on the tendency of sites to act similarly during 
a storm, dependency is characterized using a function involving both the propagation 
of storms and their regional strength. This model helps estimate the effective duration 
of a regional sample (and thus predict the merit of extrapolation) and compare local 
and regional return periods of a given storm. 


7.3.4 Modeling Dependency Between Sites 


The homogeneous regions constructed earlier, corresponding to typical storm foot- 
prints, exhibit by construction strong dependency between their sites. Therefore, it 
is necessary to find a way to model this at the regional level in a way relevant to the 
pooling strategy. 

Weiss et al. [783] have proposed a model that captures spatial dependency during 
storms by linking the probability distribution of the regional maximum with the 
regional distribution F,. Let M, be the regional maximum occurring during storm 
s: 


M= \V Yi (7.7) 
For x > 1, the distribution M, is given by 


À 
PCM, > x) = (1 — F(x), 0), (7.8) 


r 


where À and À, represent respectively the mean number of storms per year at each 
site and in the region, and ¢ the regional dependency function: 


gx)=1t+>>P{ V ¥sl¥i>x], forx 1. (7.9) 
i=l j=i+1,..,N 


This model expresses the fact that a region’s sites tend to act similarly during a storm 
that hits it. In particular, dependency between sites is encoded by the function ¢, 
which simultaneously characterizes a storm’s propagation and regional strength. 
The values of g are theoretically between 1 (total dependency) and N (total inde- 
pendence). The lower the value of o, the greater the regional dependency, indicating 
that most sites are affected by each storm and tend to behave similarly in terms 
of generated extreme values. In order to compare ¢ across regions, the effect of a 
region’s size N can be removed with the help of the dimensionless function 


ie 


¢ N-1 
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with a value between 0 (total independence) and 1 (total dependency). Weiss et al. 
[783] show that under the hypothesis that the distribution of the regional maximum 
M, is identical to that of the regional distribution F,, the function g simplifies to a 
constant: 


hy 
g(x) = —. (7.10) 
À 
Therefore, under this hypothesis, each storm hitting the region generates extreme 
values with the same (normalized) distribution as the maximum value observed 
during the event. Though in practice this hypothesis can usually be accepted, it 
should really be checked using statistical tests. In particular, it implies that the way 
in which sites behave does not depend on the storm’s intensity and that there is 
no dominant storm trajectory across the region: any observed storm can potentially 
affect any site within it. This is coherent with the way we form homogeneous regions 
by identifying typical storm footprints. 

This regional dependency model can then help us to perform certain tasks, like 
estimating the effective regional duration and establishing a link between local and 
regional return periods of a storm. Moreover, the regional sample used to estimate 
the regional distribution is constructed according to this model hypotheses. 


7.4 Local and Regional Diagnosis 


7.4.1 Constructing Regional Samples 


The pooling method consists in grouping together the normalized observed values at 
sites across the region into the same sample, before estimating the parameters of the 
regional distribution from it. However, the totality of these observed values should 
not be directly grouped into the same sample, due to a certain level of redundant 
information caused by spatial dependency. 

Storms—containing spatial footprints of extreme values from the same event— 
can nevertheless help us to filter out this dependency. To obtain independent data, 
only the normalized regional maximum of each storm can be retained. This approach 
echoes the simplification hypothesis in the regional dependency model (7.10). 
Bernardara et al. [86] used a similar filtering method, keeping only the maximum 
from the set of extreme values observed over a 72h period in a given region. Thus, 
if n, storms are observed in a region, the regional sample will be made up of n, 
normalized regional maxima. 
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7.4.2 Effective Regional Duration 


As well as constructing the regional sample using pooling, we need to know its 
effective duration, i.e., the duration of observed values after having grouped together 
extreme values from several sites. This effective regional duration D essentially refers 
to the fact that on average, A storms occur per year over a period of D years, where 
À is the mean number of storms per year at each site of the region. This quantity can 
then be used to help calculate empirical regional return periods, as well as quantify 
the gains in using a regional rather than local analysis. 

Calculation of this effective duration is closely linked to the question of spatial 
dependency from Sect. 7.3.4: the stronger the dependency, the shorter the effective 
duration. 


(i) If all sites are completely independent of each other, then a given storm will 
only affect one of the region’s sites, no matter how strong it is. In this case, each 
observed value brings new information, and D can be given as the sum of local 
observed durations d; across all sites i: 


N 
D=Nd = d;. 
i=l 


(ii) At the other extreme, if all sites are entirely dependent on each other, then a given 
storm will affect all of them. In this case, D corresponds to the local observed 
duration at one site only, as the information coming from the others is purely 
redundant. Thus, we have that 


1a 
D=d = —Yd,. 
we 
Intuitively, the likely value of D is found somewhere between these two extremes. 
Weiss et al. [783] have therefore proposed the more general form: 
D = od, (7.11) 
where ¢ represents the degree of regional dependency as defined in Sect. 7.3.4. This 
way of writing D covers the total dependency (p = 1) and total independence (g = 


N) cases too. Under the hypothesis of Equation (7.10), D can therefore be rewritten 
as 


D= — 7. (7.12) 


A natural estimate of the mean number of storms À, per year in the region is given 
by n,d, n, being the number of recorded storms there. Thus, the effective duration 
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D can be estimated by 


7.4.3 Local and Regional Return Periods 


Regional analysis provides a framework for understanding risk at different spatial 
scales. In particular, the spatial dependency model described earlier can help to 
distinguish between a storm’s regional and local return periods. For example, a storm 
with a 100-year return period in a given region will likely not lead to extreme values 
with a 100-year return period at each of the region’s sites. We now take a closer look 
at the link between these two return periods. 

Let s be a storm and x its maximal (normalized) intensity in a region. The regional 
return period of s, denoted T,, is defined as the mean time between two storms 
affecting at least one site in the region with a normalized intensity greater than x: 


1 


T, = ———_.. 
APM, > x) 


(7.13) 


The local return period of s, denoted T, is defined as the mean length of time between 
two storms impacting any given site in the region with an intensity greater than x: 


1 


T= a (7.14) 
A — F,(x)) 
From Eq. (7.8), T, and T are connected in the following way: 
a (7.15) 
g(x)" | 


Furthermore, under the simplification hypothesis for the regional dependency model, 
we have that 


T T. (7.16) 


7.4.4 Application to Marine Storms 


For each of the six homogeneous regions, a regional sample is obtained by pooling 
the normalized regional maxima of each storm. Anderson—Darling tests [723] are 
run to validate the simplification hypothesis for the regional dependency model, and 
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Table 7.2 Marine storms: indices showing spatial dependency in each region. N is the number of 
sites in each region, A, the mean number of storms per year per region, ¢ the dimensionless regional 
dependency function, D the estimated effective regional duration (in years, with 95% confidence 
intervals), and 100, the regional return period (in years) of storms generating an extreme event with 
a 100-year return period in at least one of the sites of the region 


Region 1 Region 2 Region 3 Region 4 Region 5 Region 6 
N 399 479 241 377 234 102 
Àr 10.3 25.2 9.0 8.8 4.1 2.7 
O) 0.98 0.95 0.97 0.98 0.99 0.98 
D 320 782 279 274 128 84 
[284maxima356] | [698maxima | [234maxima |[232maxima |[106maxima | [68maxima 
866] 324] 316] 150] 100] 
100, 9.7 4.0 11.1 11.3 24.2 36.9 


show that the distribution of the maxima could be assimilated to that of the standard 
regional distribution. 

Table 7.2 provides values summarizing the spatial dependency within each region. 
Storms appear to be most frequent in Region 2 and least frequent in Region 6, 
with respective occurrences of 25.2 and 2.7 storms per year on average. These quite 
different levels of storm activity were strongly influenced by the sizes of these regions, 
which were respectively the largest and smallest. The dimensionless dependency 
value can instead be used to compare the regions, irrespective of the influence of 
size. Thus, spatial dependency is largest in Region 5, meaning that the sites of this 
region have a strong tendency to behave similarly during storms. Indeed, on average, 
57 of the 234 sites (24%) were impacted by each storm that hit the region. In contrast, 
the lowest level of spatial dependency was found in Region 2. There, on average, 19 
of the 479 sites (4%) were impacted by each storm that hit the region. 

As for the effective regional duration, grouping together, for example, the observed 
values at the 234 sites of Region 5, led to a regional sample whose effective duration 
was estimated at 128 years, with a 95% confidence interval of [106 yr, 150 yr]. It is of 
interest to note that taking into account spatial dependency considerably reduced the 
effective duration that would have been found under the independent sites hypothesis 
(here, D = 254 x 31 = 7254 years). 

Furthermore, the quantity 100, corresponds to the regional return period (in years) 
of a storm generating an extreme value with a local return period of 100 years in at 
least one of a region’s sites (Eq.(7.16) with T = 100). For example, 100, = 4 in 
region 2, which means that on average, every four years, a storm in this region 
generates an extreme value with a return period of 100 years in at least one site. 
Also, while Region 1 is much larger than Region 3, their regional return periods are 
similar (around 10 years). Under the hypothesis of independence between sites, these 
quantities would have been much larger proportionally in Region 3 than in Region 
1. However, this is not the case here due to a higher degree of spatial dependency in 
region 1. Taking this into account, therefore, leads to a more realistic evaluation of 
regional risks. 
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Table 7.3 Marine storms: empirical return periods of the strongest storms in each region at the 
local and regional scales 


Region Storm Empirical regional Empirical regional 
return period (yr) return period (yr) 
1 15-18 Feb. 1986 31 321 
2 11-13 Feb. 1979 31 783 
3 11-13 Dec. 1990 31 280 
4 8-10 Feb. 1988 31 275 
5 15-17 Dec. 1989 31 129 
6 23-24 Jan. 2009 31 85 


For each region, we have estimated the return periods of the most powerful storms 
at the local and regional scales. The results are shown in Table 7.3. For example, in 
Region 3, as the normalized strength of the storm of 11—13 December 1990 has never 
been beaten over the 31 year period, the regional return period is naturally estimated 
at 31 years. Applying Eq. (7.16), the local return period is estimated at 280 years. 


7.5 Comparing the Performance of Methods 


Let us look at what we gain from using the regional approach instead of the local one. 
This comparison is run both in terms of uncertainty affecting estimated return levels, 
and in how outliers are taken into account, i.e., sample values that are significantly 
different from the others. Essentially, this means looking at the ability of the two 
approaches to detect the strongest storms. 


7.5.1 Dealing with Uncertainty in Estimated Return Levels 


Several major authors, such as [720], have shown that regional analyses lead to more 
reliable estimation of extreme quantiles than local ones. Indeed, while estimations 
based on the latter are generally unbiased, they may suffer from high variance [397]. 
Therefore, our comparison between the two approaches will be based on the variance 
of return level estimates, rather than their corresponding pointwise ones. 

The method used here consists in comparing the confidence intervals of the fol- 
lowing estimates: 


e Local return levels estimated using regional analyses: in each homogeneous region, 
we fit a GPD distribution using the regional sample. Normalizing by local indices 
then brings us back to the local scale. 
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a) (37.5, 40.95) b) (-36.25, 48.57) c) (7.94, 53.95) 


Fig. 7.10 Marine storms: comparison of estimated return levels using regional (gray) and local 
(red) analyses, with confidence interval upper bounds set at 95% (dotted line) for the 6 specified 
geographic locations. Crosses represent observed extreme values 


e Local return levels estimated at each site using local analysis: at each site, we fit 
a GPD distribution using local threshold exceedances. 


Confidence intervals in both cases are calculated using parametric bootstrap [244]. 
We then apply this method comparing the two types of analysis to our marine storm 
case study. 

Figure 7.10 shows the return levels for significant wave heights estimated by 
regional and local analysis at 6 sites, one from each of the homogeneous regions. We 
see that the confidence intervals for local analysis are generally wider than those for 
regional analysis. 

Figure 7.11 shows the relative evolution (in %) of the width of the 95% confi- 
dence intervals for centennial wave heights (100 year return levels), performing first 
a local, then regional analysis. The darker the color, the better the improvement due 
to regional analysis. Thus, lighter colored regions are those where regional analysis 
was less precise than local analysis. These relative values go from + 196% (near the 
Orkney Islands in the North Sea) down to —91% (near the western tip of Norway 
in the North Sea), with a median value of —56%. Lighter colored regions are in the 
minority, indicating that regional analysis generally leads to a decrease in uncertainty. 

This decrease in uncertainty due to regional analysis is illustrated by the concept 
of effective regional duration D (see Sect. 7.4.2). Indeed, as local analysis is based on 
observation durations d; < D, uncertainty in the extrapolation of extreme quantiles 
is reduced with regional analysis. 
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Latitude 


Longitude 


Fig. 7.11 Marine storms: relative evolution (in %) of the width of 95% confidence intervals of 
estimated centennial wave heights, first by local, then regional analysis 


7.5.2 Taking Outliers into Account 


By its nature, regional analysis helps to reduce the apparent rarity of storms seen at 
a local scale, and to better model their likelihood of occurrence. 

Take for example the marine storm context. Suppose that we have recorded at a 
given site in Region 1 (South Atlantic) a millennial wave height (1 000 year return 
level). The probability of having observed this height at this site during the period 
of recorded measurements (31 years) is 1 — 0.999313! = 3%. Now consider this 
storm at the regional level. As the effective duration of observations in Region 1 was 
estimated at 320 years (see Table 7.2), the probability of having observed a millennial 
height over a 320-year sample is now 1 — 0.999320%2 = 27%. 

Furthermore, the storm of 15-18 February 1986 did generate, in this region, the 
highest normalized wave height. Therefore, while a local analysis would only show 
that such a storm was observed once in 31 years (duration of observations made at 
each of the region’s sites), taking into account regional information leads to a re- 
evaluation of this empirical frequency as once in 320 years (effective observation 
duration of Region 1), and this for the whole region. Indeed, the underlying principle 
of the regional approach used here is that there is no preferential storm path within 
a region; each observed storm could have occurred anywhere else in the region, 
consistent with the formation of homogeneous regions based on the identification of 
representative storm paths. 
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The storm Klaus of 23-24 January 2009 generated, in Region 6 (southern Bay 
of Biscay), high wave height outliers at several sites, showing the extreme nature of 
the storm. At a given site in the region, the wave height associated with Klaus has 
been estimated to reoccur on average every 85 years (empirically, i.e., based on the 
effective regional duration) or 157 years (theoretically, based on an estimate of the 
regional GPD distribution). However, a similar strength storm is estimated to reoccur 
every 31 years (empirically) or 58 years (theoretically) in this region. 

These simple examples show that an extreme event at a given site (which could 
even be categorized as an outlier) may no longer be considered extreme at the regional 
scale. Looking at storms from a regional point of view can therefore potentially reduce 
their “exceptional” nature when observed locally. 


7.6 Conclusions 


The regional analysis approach developed in this chapter is based on distinguishing 
between: 


(i) regional effects (using a marine storm case study, where storm strengths and 
spatial footprints were observed at the synoptic scale); 

(ii) purely local effects (via a local index increasing or reducing the regional impact 
of storms). 


This supposed separation of effects can help to understand processes generating 
observed extreme values, while simultaneously reducing complexity. Several possi- 
ble paths can be considered to improve this regional approach—some of which are 
described below. 


7.6.1 Formation of Homogeneous Regions 


In this chapter, homogeneous regions are obtained using representative storm foot- 
prints. However, the use of covariates involving natural processes (e.g., wind strength 
and direction, atmospheric pressure, bathymetry, etc.) combined with multivariate 
analysis methods (such as principal components analysis (PCA) [467]) could either 
help improve the current method or provide an alternative for constructing homoge- 
neous regions, where sites are grouped together based on their similarity with respect 
to the covariates. 

The formation of homogeneous regions could also benefit from an approach 
based on weather patterns, 1.e., distinct and typical synoptic situations. For example, 
Garavaglia et al. [303] have identified weather patterns in France so as to relate rain- 
fall events with the processes generating them. Each observed extreme value is then 
linked with a corresponding weather pattern, which can notably help homogenize 
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samples. A similar approach could be developed to construct homogeneous regions 
in a regional analysis study. For instance, 


(i) each site could be associated with the weather pattern that most frequently leads 
to extreme values at this site; 
(ii) sites could then be clustered based on these types of dominant weather pattern. 


It would also be possible to define homogeneous regions with the help of correlo- 
grams specifically dedicated to extreme values. These are called extremograms [201]: 
pairs of sites are considered to be part of the same region if their observed extreme 
values are at least partially correlated. 


7.6.2 Residual Heterogeneity 


It is possible that the Hosking—Wallis test of statistical homogeneity [397] fails to 
detect a regional heterogeneity. This is unfortunate because a source of 
heterogeneity—even minor—can have a significant effect on the final results. One 
way to deal with this is to include the “traces” of heterogeneity that the test was unable 
to detect. The bootstrap technique for calculating uncertainty could, for example, 
integrate these suspicions of heterogeneity: each simulated bootstrap region could 
be slightly heterogeneous, and the whole uncertainty would then include uncertainty 
due to undetected heterogeneity. This remains difficult to do in practice since het- 
erogeneity can come in many forms, and there are an infinite number of ways to 
artificially simulate heterogeneous regions [780]. 


7.6.3 Historical Regional Analysis 


Uncertainties associated with estimating extreme-valued events have pushed the 
development of methods looking to increase the sample size, like regional analy- 
sis does. Taking advantage of information available on historical severe events is 
thus an alternative approach to constructing regional samples. The main hypoth- 
esis for including historical data in the statistical model is that there exists some 
threshold above which all exceedances were recorded during the historical period 
of interest. In a local analysis of extreme rainfall, Payrastre et al. [602] provided a 
method to include historical information in the likelihood function. Also, Gaume 
et al. [322] and Nguyen et al. [557] developed a regional Bayesian model that can 
take into account severe rainfall occurring at sites where measurements were not 
systematically recorded (see Chap. 11). 

However, collecting historical data is usually a complicated task and the resulting 
dataset is often imprecise. Subsequent statistical analyses must therefore take this 
into account so as not to introduce bias into estimates. Although tricky, the extension 
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of regional analysis to take into account historical information is a promising avenue 
for further reducing uncertainty in extreme value estimates. 


7.6.4 Non-stationary Regional Analysis 


In this chapter, the question of non-stationarity (temporal trends due to climate change 
and/or the effect of physical covariates influencing regional distributions) has not 
been studied. With regards to the extreme values statistics for storms presented in 
this chapter, non-stationarity is not really a problem, as many studies indicate that 
the overall storm situation in Europe has not shown any significant trends (see the 
MICORE? project [272] and the Preface of this book for further details). 

Nevertheless, in the case of proven or hypothesized non-stationarity, Hanel et al. 
[378] and Roth et al. [669] have, for example, introduced a non-stationary regional 
model allowing temporal trends to be incorporated into the regional distribution’s 
parameters. Techniques for dealing with extreme values of non-stationary time series 
are presented in the following chapter. 


7.6.5 Multivariate Regional Analysis 


This chapter has only looked at the univariate case. However, extreme environmental 
conditions are often characterized by different combinations of several interdepen- 
dent variables (e.g., wind and extreme wave heights), so simultaneously taking into 
account all of these variables could certainly help to better characterize the risk of 
extreme events. In this context, Chebana et al. [158] have extended the regional index 
flood model to the multivariate case using copulas. Methods and tools for dealing 
with multivariate extreme values are presented in Chap. 9. 


2 http://www.micore.eu. 


Chapter 8 A) 
Extreme Values of Non-stationary Time get 
Series 


Sylvie Parey and Thi-Thu-Huong Hoang 


Abstract This chapter considers the problem of quantifying extreme natural hazards 
in non-stationary situations, namely outside the traditional framework. For certain 
hazards, this framework makes it possible to take into account the influence of climate 
change. Two main approaches are considered: the first deals with the incorporation 
and selection of trends in the parameters of the laws of extremes, while the second, 
nonparametric, considers trend variations over the first two moments of the extreme 
distribution. 


One of the underlying hypotheses of the classical theory of extreme values is that 
variables are identically distributed or stationary, i.e., the distribution is the same 
for any subsample of the original sample. This hypothesis is rarely satisfied by 
natural phenomena, if only because the latter are often seasonal (see Chap. 5). In 
general however, by restricting ourselves to the season in which events of interest 
occur, it is usually possible to obtain a stationary setting. However, in cases where 
a phenomenon evolves over time, it is necessary to use other techniques to take 
non-stationarity into account. These are the subject of the present chapter, and are 
introduced in the univariate setting in Chap. 6. 

Remember nevertheless that taking into account detected non-stationarity is not 
necessarily easy in practice. Indeed, this is an ongoing subject of discussion, espe- 
cially in hydrology. Montanari and Koutsoyiannis [521] and Serinaldi and Kilsby 
[698] consider that it has not been proven that doing so brings real improvement. 
Their main arguments are the following: 


e Estimating a non-stationary model implies estimating more parameters, which 
necessarily increases incertitude. It seems all the more necessary to compare the 
confidence intervals and not simply the point estimates obtained. 

e The evolution in variables whose extreme value behavior we are trying to esti- 
mate is generally poorly understood (projected values of climate parameters are 
still not terribly reliable for certain variables, e.g., rainfall [779]) or even unpre- 
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dictable. These cannot be inferred from observations alone, and the extrapolation 
of a deterministic trend over long time periods is not a particular good idea unless 
the trend represents a stationary exogenous process—in which case it would need 
to be determined. 


These authors consider that stationarity should remain the default hypothesis, 
even in cases of proven non-stationarity, and that in any case, it is always necessary 
to compare confidence intervals obtained with or without the stationarity hypothesis. 

This chapter first presents two of the main methods for modeling non-stationarity, 
either by adding trends to the parameter values (which thus become functions) 
of extreme value distributions, or nonparametrically using a mean-variance trend 
approach. Note that any non-stationary modeling choice requires the notion of return 
level to be redefined, because as it stands, it a priori supposes a certain regularity 
in phenomena. Besides, other risk indicators could be more adapted. A section is 
dedicated to this question in this chapter. Lastly, we study several meteorological 
hazards in order to illustrate the two approaches. 


8.1 Trends in Extreme Value Distribution Parameters 


The first approach—proposed among others by Coles [173]—consists in supposing 
that the parameters of extreme value distributions are not constant over time and 
depend on some covariate. The most natural covariate here is the time f, but others 
may be considered, such as for instance the concentration of greenhouse gases in the 
atmosphere, or an index related to variation in the climate system, e.g., the North 
Atlantic Oscillation (NAO). This approach, supported by numerous examples and 
asymptotic results [173], can be formalized under the MAXB approach through the 
modeling hypothesis: 


X,~ Fev (u(t), o), EO), (8.1) 


where (X;); is a sequence of independent but not identically distributed maxima. 
The estimation, selection, and verification of the well-foundedness of the simple 
parametric function for each parameter, e.g., linear as in: 


W(t) = Lo + i:t, 


is the main task of the modeler. This needs to be a compromise between increasing 
estimation complexity (due to an increasing number of parameters) and a realistic 
representation of the non-stationary behavior of the sequence of observed extreme 
values. The first step in this work is to limit the number of dimensions by asking 
whether it is necessary to make all three parameters non-stationary, or whether non- 
stationarity is sufficiently well modeled when only concentrating on one (or two) 
of the parameters. A variety of techniques can be used to perform an exploratory 
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analysis of the temporal evolution of the parameters and indicate the form of the 
evolution if it exists. 

The choice of such forms means we can then conduct classical parametric estima- 
tion (e.g., maximum likelihood) and run statistical tests that work in the functional 
context in order to obtain one (or several) pertinent models. 


Example 8.1 


In the GEV case (Eq. (8.1)), once pertinent forms have been chosen for 
w(t) = (u(t), o (t), &(t)), so that for each observation x at time t; the following 
condition holds: 


Xy — (ti) 
&(t) (=) +120, 


then under the independence hypothesis on the data, the statistical likelihood 
of the problem: 


LIY tin) = [[ F (ile), 0G), EG) (8.2) 


ý= 


is well defined and can be maximized using numerical [173] or Bayesian (see 
Chap. 11) approaches. 


8.1.1 Detecting and Modeling Trends 


8.1.1.1 Semiparametric or Nonparametric Trend Estimation 


Suppose we have n extreme values denoted x; = (x;)i=1,....n, renormalized so that 
t; = i /n. Detecting trends in the classical parameters of the distribution of the asso- 
ciated random variable X, is a task that can be conducted using various approaches. 
Detection of a trend supposes that the question of what form it might take has been 
asked. For the most general setting, it is preferable not to choose the form a priori and 
let the data reveal the trend to us (under a few regularity hypotheses, the most usual 
being the existence of the second derivative). Semiparametric and nonparametric 
approaches can be used to do this. In the following, we describe two of the main 
ones. Previously, adapting the approach of preliminary independence tests provided 
in Table5.1, we provide on Table 8.1 a summary of preliminary stationarity tests, 
that are selected by their good power for moderate sizes. They can appropriately 
accompany the tests described hereafter. 
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Table 8.1 Selected nonparametric tests of stationarity of univariate time series for samples of 


moderate size 


Test Kwiatkowski- Augmented Philipps-Perron (PP) 
Phillips-Schmidt- Dickey-Fuller (ADF) 
Shin 
(KKPS) 

Null hypoth. stationarity non-stationarity non-stationarity 

Ho (with null mean or (with or without (with or without 
constant bias) trend) trend) 

Alternative hypoth. non-stationarity stationarity stationarity 

Hi 

Minimal sample size |n > 30 n > 30 n > 20 

(in practice) 

References [190] [297] [614] 

1. The first is a semiparametric approach called local models (LM) 


[202, 371, 635] based on a weighted polynomial representation of the given 
extreme value distribution. This approach starts from the hypothesis that the data 
(x,t) form a point process which is marked (or colored, cf. Definition 6.11) by 
the temporal index, whose associated counting process (with n values) is itself 
random—possibly Poissonian [371]. The approach consists in selecting itera- 
tively a set of observations containing information on the distribution at time t 
via a kernel w(.) which is bounded, typically on [—1, 1] [202] (cf. Sect. 4.3.2). 
At each time f, the corresponding extreme value is then associated with a vector 
of weights (œw; (t))i=1,...n defined by 


yasi 


wilt) = œ {(i/n — t)/h}, 


where h is a smoothing (bandwidth) parameter. A weight œ; (t) is O when à is 
outside the interval [n (t — h), n(t + h)]. The combination of this weights vector 
and a set of polynomials with statistical likelihood exp(¢(x;|w)) at x; then allows 
a global statistical log-likelihood to be defined at time ¢ [263]: 


etl) = D oi aly). 


i=l 


(8.3) 


Maximizing (8.3) with respect to Ÿ then allows a local estimator ÿ to be con- 
structed. It is straightforward to modify this approach to associate y and the 
parameters of a GEV distribution (for example), then interpolate the functions 
obtained for fi;, 6; and £, in terms of f, to finally deduce whether a trend is 
present or not, and the type of trend if present. However, the choice of bandwidth 
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h, polynomial degree, and kernel! will all influence the results. The fact that in 
general, the behavior of extreme values changes slowly over time, often encour- 
ages the choice of low-dimensional polynomials—often with degree 1. 


2. The second method is a nonparametric penalized maximum likelihood (PML) 
approach [155, 156, 600, 668], which supposes, very generally, that in each 
dimension j (€ {1, 2, 3} for classical GEV and GPD models), the jth component 
yw, of the parameter vector for an extreme value distribution can be functionally 
estimated using a generalized additive model 


VO = gj (yn +h; O), 


where g; is a link function, 7; a parameter vector, y a covariate vector and hj; 
a nonparametric smoothing parameter whose support is limited to a compact set 
æ% containing all of the observed values of t (typically, h j is a spline [360]). The 
estimation of each function t > y;(t) can then be obtained by maximizing the 
penalized log-likelihood [156] (defined here by pursuing Example (8.2)): 


log Lly (tin) — D E [oor a. (8.4) 
i=1 š 


The introduction of the penalty term in the estimation rule is a standard technique 
that helps to avoid overfitting by selecting smooth functions h; [355, 380]. More 
precisely, the integral terms provide a measure of the roughness of the second- 
order derivative terms, while the parameters y; are selected in a way that limits 
the impact of the estimation of the h; on the regularity of the estimated functions. 
For more details, as well as a recent application in the POT context, see [157]. 


In both settings, confidence intervals can be obtained using the delta method 
(Theorem 4.4) or bootstrap approaches (see Sect. 8.1.2.3), which remain valid in the 
non-stationary framework under convergence conditions on the statistical estimators. 
Panagoulia et al. [584] recommend using bootstrap methods when sample sizes are 
small. 

The thesis [391] compared the two approaches described above and judged that 
the nonparametric PML one was generally more amenable than LM when it came to 
the study of non-stationary extreme values of natural hazards. In effect, the former is 
less computationally intensive and leads to better automatic selection when there is 
local linearity [635], despite improvements to the LM approach proposed by [371]. 

For this reason, a nonparametric estimation algorithm for the PML approach— 
the results of which are shown in Figs. 8.1 and 8.2—has been proposed for both the 
MAXB and POT frameworks, and its convergence to the maximum of the penal- 
ized likelihood studied in detail in Chap. 3 of [391]. This algorithm is based on the 
principle that the optimal estimation of the smoothing parameters is obtained using 


' To a lesser extent: In practice, the Epanechnikov or Gaussian kernel are often used. 
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Fig. 8.1 Nonparametric evolution of the parameters of the GEV distribution fitted to maximum 
daily temperature data at La Rochelle (France). The dotted lines show the 90% confidence interval; 
the thick curves are the estimates obtained when supposing that the three parameters are variable, 
the thin ones when supposing the shape parameter & is constant 


cross-validation [579], and in the spline case, these calculations requiring high com- 
putational cost can be approximated using an iterative approach proposed by [360]. 
This iterative cross-validation method is based on rewriting (8.4) in the form of 
a least squares criterion involving the Fisher information matrix (Uy, y,)j;,« of the 
extreme value model (cf. Eq. 4.2.2.4) and some pseudo-data. 
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Fig. 8.2 Nonparametric evolution of the parameters of the GEV distribution fitted to minimum 
daily temperature data at Uman (Ukraine). The dotted lines show the 90% confidence interval; the 
thick curves are the estimates obtained when supposing that the three parameters are variable, the 
thin ones when supposing the shape parameter & is constant 
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8.1.1.2 The Case for Shape Parameter & 


In the dedicated literature, it is commonly accepted that the shape parameter & for 
extreme value models (GEV and GPD) can be considered constant over time, i.e., 


E) =§ Vt. 


Indeed, Zhang et al. [800] exclude the possibility of variation in this parameter by 
asserting that there is no reason for the shape of the tail of climate variable distribu- 
tions to change over typical periods of observation, which rarely exceed 100 years. 
This assertion uses for support numerous estimation results obtained by Coles [173] 
as well as from simulations run in [202]. In [596], estimations were made over differ- 
ent periods of a long temperature time series from Paris-Montsouris (since 1873), and 
indeed they showed that the shape parameter did not change significantly. For tem- 
perature time series from La Rochelle (France) and Uman (Ukrainia) corresponding 
to different climates, the recent thesis [391] estimated the nonparametric evolution 
of the parameters of the GEV distribution fitted to the annual maxima. Both the case, 
where the three parameters were considered variable and that where the shape param- 
eter was constant, were examined. This work showed that the constant value of the 
shape parameter remained in the confidence interval of the varying estimate of the 
same parameter, while the evolution of the two others remained similar whether or 
not the shape parameter was considered variable (Figs. 8.1 and 8.2). For this reason, 
in the following it is supposed that the shape parameter does not change significantly 
over time. Readers interested in the technical details and a more in-depth historical 
treatment are recommended to consult [391]. 


8.1.2 Estimating and Selecting a Global Functional Model 


The most frequently obtained functional forms for the parameters u(t), o(t), and 
sometimes é (t), are polynomial (quadratic at most, to obtain a monotonic trend that 
can be extrapolated) or piecewise linear [596]. It is, therefore, feasible to define 
nested models and test the pertinence of a more complicated model with respect to 
a less complicated one using a likelihood ratio test (LRT)—see Theorem 4.1. Let us 
recall, in the present case, the following theorem which uses the concept of deviance 
[713]. 


Theorem 8.1 Ifa model Mo with parameters y ® is a submodel of a model M with 
parameters Yo = (WwW, Y®), where Y® = 0 (of dimension k), then the deviance 
Statistic 


D = Llo) — Ly P), 
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where €(Wo) and {(Yÿ® are the maximized log-likelihoods for each model, tends 
asymptotically to a x? distribution with k degrees of freedom under the hypothesis 
that the model M is valid. 


The more complex model with parameters wo is then rejected for a given level 
1—aif D < cy, where cg is the order 1 — a quantile of the x? distribution with k 
degrees of freedom. For example, if we want to test for the existence of a linear trend 
in the location parameter of the GEV distribution: 


U = Ho + Hit, 


then ya = (u,0,Ë) and Wo = (uo, 1, 0, Ë), and the linear trend is considered 
significant at 95% if D is greater than cy = 3.841 (a = 0.05, k = 1). 

It is often necessary to perform several tests to obtain the optimal model, since 
all possible comparisons need to be made. 


8.1.2.1 The MAXB Framework 


If we wish to identify the optimal model for the block maxima distribution, without 
any prior assumption on which parameters potentially include trends (except for the 
shape parameter £, supposed constant), conducting the following tests is required: 


(uo, H1,0) against (4,0), 
(u, 00,01) against (u, o). 


Then, if (uo, 41,0) is retained, test (Uo, M1, Op, 01) against (uo, 1, ©). Else, if 
(u, O0, 1) is retained, then test (uo, 41, O0, o1) against (u, 00, 01). 

Note that an alternative goodness-of-fit x? test well adapted to the specific case 
of the functional Weibull distribution, which can be easily derived in the Gumbel 
and Fréchet cases, is provided in [609]. 


8.1.2.2 The POT Framework 


In the POT framework, retaining the hypothesis that the shape parameter & is constant, 
two approaches can be considered. 


1. In the first situation, we can fix a threshold and look for a trend in the intensity À 
of the Poisson process (i.e., in the frequency of exceeding the threshold) and/or 
the scale parameter o of the Pareto distribution. The required tests are then: 


e Ho: (A constant, o variable) against Hı: (A constant, o constant); if Ho is 
retained, we then test H3: (A variable, o variable) against Ho, 

e H3: (A variable, o constant) against H4: (A constant, o constant); if H3 is 
retained, we then test H4 against Hs: (A variable, o variable). 
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2. In the second situation, we vary the threshold so that the frequency at which it is 
exceeded remains constant, then look to see whether it is possible to identify an 
additional trend in the scale parameter o of the Pareto distribution. The variable 
threshold retained is generally obtained by quantile regression [794] for a very 
high quantile threshold for the distribution of the variable being studied (e.g., 95% 
or 98%). See Sect. req.quantile._page for more details. 


In addition to the use of the deviance via the LRT theorem, two specific statistical 
criteria based on parsimony, otherwise known as a model complexity versus fit 
tradeoff, are often used in model selection: AIC (Akaike Information Criterion [25]) 
and BIC (Bayesian Information Criterion [693]). If ĉis the maximized log-likelihood 
for a model with p parameters, then for a sample of size n, 


AIC = —2¢ + 2p, 
BIC = —2¢ + plogn. 


If the sample size n is small (typically n < 11p + 9), a modified AIC can be used 
[399]: 


AIC = —2Î + 2p + PN 
n—p—l 

These criteria are typically used to choose the optimal number of parameters 
by selecting the model that minimizes them. Note however that they have different 
meanings. As it performs a bias-variance tradeoff [97], AIC is designed to be good 
at selecting the “least bad” model in a list that does not contain the true model. In 
contrast BIC, which is known for selecting (possible underfitted) models with a lower 
dimension than AIC [466], is consistent in the sense that it (asymptotically) selects 
the true model in a list, given that the model is in the list and unique [130]. See [131] 
for more details. 

AIC would thus seem to be more theoretically sound, and the preferential criterion 
of choice. However, simulations by Panagoulia et al. [584] show that for samples 
of size 20-100, both criteria are equally capable of detecting non-stationarity, and 
BIC selects good models more often, due to its general ability to retain the most 
parsimonious ones. 


8.1.2.3 Bootstrap and Confidence Intervals 


Three types of methods are typically proposed and tested in the literature for esti- 
mating confidence intervals for extreme levels using bootstrap. Bootstrapping means 
generating a large number of equivalent samples starting from the original dataset in 
order to construct a distribution for the quantity whose confidence interval we wish 
to estimate. 
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All of the approaches we consider here involve a full sample bootstrap, i.e., all 
generated samples have the same size n as the original sample. The specialized liter- 
ature (e.g., [630]) suggests that it is necessary to use subsampling (drawing samples 
of size m < n from the original sample of size n) to obtain good nonparametric esti- 
mates of extreme quantiles, but Panagoulia et al. [584] consider that this restriction 
does not concern the specific case of extreme quantiles since they are estimated as 
functions of parameters of fitted distributions. 

Of the three approaches, two are nonparametric. Let us denote (x;,,,..., X,,) the 
sample of non-stationary extreme values. 


1. Random-t resampling. Here samples are formed from random draws with replace- 
ment from the initial sample. 

2. Fixed-t resampling. This consists not in resampling from the initial values, but 
from the stationary residuals obtained from them. In [138], this method is given 
for the block maxima approach, in the steps described below (after supposing that 
E(t) = & as suggested in Sect. 8.1.1.2): 


Let u(ti), o (ti), and £ be the parameters of a non-stationary GEV model 
fitted to the initial sample, i € {1,..., n}. 


(i) Calculate the independent so-called Cox-Snell residuals, exponentially 


distributed 
- —1/Ẹ 
ti) = | 1+ —— (x, — A(t ; 8.5 
&(ti) | D (x, — AC ] (8.5) 
(ii) Resample the residuals e(¢;) as ¢,(¢;) using random sampling with 
replacement, 
Gii) Obtain thus a new sample: fori = 1,...,n, 


De a ONE tof te nae 
= M+ (s; (t;) 1). (8.6) 


(iv) Fit a non-stationary GEV model to the new sample and calculate quan- 
tities of interest (parameters, return levels). 

(v) Repeat steps (ii)—(iv) a large number of times to obtain a distribution 
for each quantity of interest. 


3. Parametric bootstrap. Here samples are not generated by resampling from the 
initial sample, but rather by simulating a large number of samples of the same 
size using the fitted non-stationary GEV distribution. For each time f;, a new value 
X, is generated from a Gry (f(t), 6 (ti), Ê (ti)) distribution. 


168 S. Parey and T.-T.-H. Hoang 


An intermediate method between random-t and fixed-t turns out to be useful in 
practice by limiting the increase in uncertainty related to the handling of estimators 
in Eqs. (8.5) and (8.6). Instead of resampling from the original sample, it is possible 


to produce draws with a stationary distribution, 
Xt, — f(t) 
elti) = — ~ 

O (ti) 


then sample values ¢,(t;) (with replacement) within the sequence of (e(t;));, then 
directly construct a new sample: 


Tr = MG) + 6 (ti)en (ti). 
With a bootstrap distribution in hand, several techniques for calculating confidence 


intervals for the debiased estimator w — ( E[Ÿ] — y) of the parameter (or function 
of interest) w are available. 


(a) Normal approximation. Based on a classical asymptotic approximation by a 
normal distribution (cf. Chap. 4) and an estimation of the bias, the 100(1 — æ)% 
confidence interval can be estimated by 


E = (ù = tn) + Zat (Wo) , ý- (5 = 7) + z1-a2t (Ws) | ; 


where Ÿ is the value estimated using the initial sample, ÿ and t( Wn) respectively 
the mean and standard deviation of the bootstrap estimates, and z,/2 the order 
a/2 quantile of the standard normal distribution. 

Nonparametric approach. The boundaries of the 100(1 — a)% confidence 
interval are estimated by the order &/2 and 1 — œ /2 empirical quantiles of the 
bootstrap distribution. We denote Wo the variable that has this distribution. As 
in the previous method, this approach can be corrected by supposing that the 
unknown bias Ÿ — y can be estimated by the difference Wp — Y. The original 
bootstrap distribution is thus modified by replacing Wo with Wo + Ÿ — Wop. 
Bias-corrected and accelerated (BCa) method. This method, proposed by 
Efron [245], allows confidence intervals obtained using the nonparametric 
approach (empirical quantiles) to be corrected so as to obtain the correct 
skewness* and kurtosis* of the distribution. The boundaries of the 100(1 — a)% 
confidence interval are estimated by the qe, and qa, quantiles of the bootstrap 
distribution, with 


(b 


wm 


(c 


wm 
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n 20 + Za/2 
a, = P | zo + Ea ; 
1 — â (ĉo + Za/2) 
x Zo + Z1—a/2 
a2 = | zo + —— SE i 
1 — â (ĉo + 21-42) 
where ® is the cumulative distribution function of the standard normal distribu- 
tion, and 
1 k i 
2 —1 {Y<y} 
Z0 = ® =a , 


where M is the number of bootstrap samples and @ a bootstrap acceleration 
constant defined as 1/6 of the skewness of set of n estimates of m produced by 
jackknife or leave-one-out, i.e., 1/6 of the skewness of the distribution of the 
estimators o): for i = 1,...,n, where the ith one of them is obtained by 
removing the ith observation from the initial sample: 


3 (to = Jo) 


3/2 ? 
> (do = žo) 1 


where Yo = 27! Xi Vo. 


With the help of a simulation study, Panagoulia et al. [584] showed that, like 
in the stationary case [448], the different methods perform similarly, but the para- 
metric bootstrap with the BCa method produces better confidence intervals. The 
fixed-t method has a tendency to produce wider confidence intervals. In terms of 
computational time, the methods are globally equivalent, though the BCa method by 
construction is slightly more costly than the other approaches. 


8.1.3 Limitations 


One of the limits to these approaches is already clearly visible in their descriptions. 
Indeed, the necessity to extrapolate requires the choice of a monotonic trend—often 
linear—which is not necessarily the one that best describes the true evolution of the 
phenomenon. Aside from this, as the future will certainly not look exactly like the 
past, such trends are themselves likely to change. This is why it is not advisable to 
extrapolate too far into the future based on trends detected in the past. One way to 
overcome this type of limitation, favored, for example, in [229], is to consider as 
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covariates one or several meteorological variables whose evolution over time can 
be determined using climate simulations, instead of time. In doing so however, the 
problem becomes strongly dependent on uncertainties associated with the climate 
modeling and removal of bias from these models. 

Generally speaking, maximum likelihood estimation turns out to be quite sensitive 
to large values found toward the end of a sample, whether observed or simulated. 
Trends identified can, therefore, be overly dependent on such features and become 
less meaningful. If return levels are estimated by applying these approaches, then it is 
strongly recommended not to target an overly distant future, and secondly to update 
the estimates regularly based on newly available observations—typically every 5 
to 10 years. Confidence intervals provide only a limited indication of the actual 
incertitude in extrapolating since they are estimated with respect to the model selected 
(using the delta method or parametric bootstrap). Further work is still needed on the 
study of model uncertainty to obtain practical rules for determining periods over 
which extrapolation is meaningful. 


8.2 Mean-Variance Trend Approach 


The mean-variance trend approach originated from a detailed study of temperature 
evolution undertaken in the founding work [391], which focused notably on moments 
of the distribution and their links with the extreme values. The aim was to find 
an alternative approach to the parametric one, suffering from limitations identified 
above. 

By nonparametrically estimating the evolution over time of the daily temperature 
(daily mean, minimum, or maximum) in France, Europe, or the Northern Hemisphere 
as a whole, it has been shown that trends in the mean and variance are not independent 
[593]. For instance, in France, when the mean increases in winter, the variance 
decreases; the opposite is true in summer: an increase in the mean is associated with 
an increase in the variance. So as to evaluate the impact of such changes on trends in 
extreme values, a test statistic was constructed to help decide whether the extreme 
values of the standardized variable (i.e., after having removed nonparametric trends 
in the mean and variance) could be considered stationary. 

More generally, let us denote (X,), a sequence of non-stationary random variables 
such that E[X,] = m, and V[X,] = s. Extending Formula (4.1), the associated 
standardized variable is 


X(t) — 
Y(t) = ae 
which can be approximated by 
Ÿ (4) = “on. 


where (ñ,, 5) are estimators of (f, 52). 


8 Extreme Values of Non-stationary Time Series 171 


8.2.1 Methodological Principles of Testing and Modeling 


8.2.1.1 Estimation of the Standardized Variable 


To test the stationarity hypothesis for the extreme values of the variable Y,, it is first 
necessary to remove the effects, estimated by (m, 82), corresponding to trends in the 
mean and variance. The decision to nonparametrically estimate (m,, s?) is pushed 
by wanting to make as few assumptions as possible on the trends, and to find the best 
compromise between smoothing and fitting to observed trends. This nonparametric 
smoothing can be performed using a variety of approaches (splines, penalized least 
squares, wavelets, local regression, etc., see Sect. 8.1.1.1). After several tests showed 
that there was not a great difference in results, the LOESS method for local regression 
(Sect. 4.3.2) was retained for looking at temperatures in [593]. 

The subsequent question is which smoothing parameter to use. Usually the optimal 
one is found using cross-validation. However, due to high autocorrelation in the daily 
temperature time series, and the fact that the variance is not constant over time, this 
approach needed to be modified. Indeed, Hoang [391] has proposed the modified 
partitioned cross-validation method for this setting. 


Constructing approximately standardized data [391]. 


(1) Select the dates of the season to be considered across the whole period of 
data collection. 
(ii) Estimate m(t) using LOESS and an optimal smoothing parameter found 
using modified partitioned cross-validation. 
(iii) Estimate $2(r) using LOESS starting from (X (t) — m(t))*, with the same 
smoothing parameter. 


8.2.1.2 Testing Stationarity 


We wish to test the stationarity of the extreme values of the standardized sequence 
Y (t), but there is no unique alternative to the stationarity hypothesis to test. In effect, 
if the extreme values of Y(t) are non-stationary, this can take different forms, e.g., 
a trend (monotone or other), a cycle, or even both at the same time. For this reason, 
the hypothesis has to be tested using simulation. Consider a distance between time 
functions u(t) and v(t), e.g., the L2 norm: 


1/2 
Du. = (| U0) = wnat) , 
te[0,T] 
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where [0, T] is the period examined in the study. Thus, if a function u is estimated 
by v, D(u, v) is a measure of the quality of v as an estimator of u. 

The parameters of the distribution fitted to the extreme values of Y(t) can then 
be estimated in two ways: either as nonparametric functions of time as described 
in Sect. 8.1.1, or as constants. Suppose that we estimate the extreme values of Y (t) 
using the maxima in blocks method, fitting a GEV distribution. Then the parameters 
4 and o of this distribution (the shape parameter & is still considered constant) can 
be estimated either as functions f(t) and õ (t), or constants and 6. Whether the 
stationary extreme values hypothesis is true or not, the nonparametric estimates of 
A(t) and a(t) converge to the true values x and o when the sample size tends to 
infinity, at a rate which depends on the degree of smoothing. 

The situation is different when estimating with constants jz and ô. If the stationar- 
ity hypothesis is true, then they converge to the true values at the rate y/n, where n is 
the sample size. In this case, D(4, 1) is very close to D(u, A) for very large sample 
sizes. On the other hand, if the hypothesis is false, converges to a constant which 
is different to the true function u(t), and the distance D(à, j1) does not converge to 
0, remaining above some positive constant A. 

This result has also been shown for D(6,6) by simulation [391]. The idea is 
to generate the distributions of D({i, A) and D(G, 6) when the hypothesis is true. 
For this, we simulate a large number of samples of the same size as the sample 
of maxima of Y(t) using the distribution gy (My, oy, E) fitted to the maxima of 
Y (t). For each simulated sample, we fit a GEV distribution with constant parameters 
(À, Gp) and another with nonparametrically time-evolving parameters (ñp, 6p), SO as 
to eventually obtain distributions for D(ji,, Ap) and D(6,, 6). These distributions 
correspond to the errors made by estimating constant parameters using time-evolving 
ones. 

Then, if the distances estimated by fitting to the extreme values of Y (t) are within 
these distributions, it means that the distribution of extreme values of Y (t) cannot be 
distinguished from a stationary one, and we accept the stationarity hypothesis for the 
extreme values of Y(t), or more precisely, we cannot reject it. The same procedure 
can be applied in the POT setting for the threshold exceedance frequency and the 
scale parameter of the Pareto distribution. 

Applying this test to numerous temperature time series has shown that in most 
cases, the hypothesis is not rejected [594]. 


8.3 Redefining Return Levels and Other Risk Criteria 


According to the definition in Sect. 3.1.1.3, a level x, associated with a return period 
T (with p = 1/T) corresponds to a level of the natural hazard observed on average 
every T years, or equivalently, the average probability per year of exceeding that 
level is p. However, this definition supposes that the frequency of extreme events 
remains the same, independent of the period under consideration, i.e., the stationary 
case. How then can we define a return level of T years in the presence of a trend? 
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Several alternative definitions appropriate to the non-stationary setting have been 
proposed in the literature. However, it is not clear that these are satisfactory for 
decision-makers. Thus other criteria have been proposed to aid more practical deci- 
sion making. 


8.3.1 Return Levels in a Non-stationary Setting 


A first definition, proposed by [596], consists in getting as close as possible to the 
definition from the stationary setting. In a general setting, suppose that A is a duration 
corresponding to the granularity required in the stationary context for a phenomenon 
measured by X;. Typically A is 365d in the common framework when annual, 
centennial, millennial, etc., levels are wished indicators. 


Definition 8.1 (Return level in a non-stationary context) The return level in the 
non-stationary context, defined starting from time fo, is the level x, such that the 
expectation of exceeding x, in the next T periods of length A is 1: 


E[X; > xplo <t<A-T|=1. (8.7) 


In the two classical frameworks for extreme values (MAXB and POT), detailed 
below, the estimation of the level x, must be performed numerically, which is also the 
case for estimating a confidence interval for the estimator £, produced. As before, it is 
possible to use the delta method or a bootstrap approach to perform these calculations. 


8.3.1.1 MAXB Framework 


In MAXB framework, where each maxima distribution is asymptotically GEV, the 
return level is calculated by integrating the distribution over the upcoming period 
using the extrapolation of trends identified in the observed past, such as the level x, 
satisfying the following properties. 


e GEV parameters as functions. When & 4 0, the level x, is such that 


1 PAT š pi 
i 2 {! exp ( [i + of) (xp u(t) )] = 1, (8.8) 


where n, is the number of values by block, and tọ the start date of the extrapolated 
period. When & = 0, Eq. (8.8) becomes, by continuity, 


to+AT (t) 
we De CL ler a 
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e Mean-variance trends. Equation (8.8) can be modified to 


1 to+AT E Xp — m(t) —V/é 
m À i ao pod a oe 


However, the limitation related to extrapolating trends from the past is still present 
(Sect. 8.1.3), even though the fact that these trends were identified using the whole 
set of values and not only the highest ones makes them more robust. To overcome 
this constraint, it is possible—using the mean-variance trend approach—to esti- 
mate a return level for a specific future period using x, = Sf Yp + my, where y, is 
the stationary return period of the standardized variable and mf and sp the mean 
and standard deviation of the variable over that specific period. In the temperature 
case, these values can be obtained using climate simulations [593]. 


8.3.1.2 POT Framework 


In this setting, where each distribution of exceedances is a GPD, the return level x, 
is calculated as follows: 


e GPD parameters as functions. When £ Æ 0, 

to+ AT E -1/é 
2 [i + a0 (xp — ) 1 =1 (8.9) 
in the case of a fixed threshold, or 


to+AT E -1/é 
> [i 4 a (xp — wo) | I=1 (8.10) 


t=to 


in the case of a variable one. When € = 0, Eqs. (8.9) and (8.10) become, respec- 


tively, 
to +AT 
y [i + exp (2) woot 


t=to 
and 
tot AT 
= t 
> 1+ exp ("20 I=1. 
Pers a(t) 


e Mean-variance trends. Eq. (8.9) can be modified to 
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8.3.1.3 Stationary Return Level for a Given Year 


Alternatively, we can suppose that each year constitutes a stationary period and 
estimates a different return level for each using the formula given in the stationary 
case for each of the approaches. This is how the return.level function from the 
R package extRemes is coded; if a trend is identified and taken into account, a 
return level is estimated for each year in the observed time series. This function 
will not extrapolate trends beyond the observed period. However, we can still do 
this anyway and estimate—by extrapolating detected trends—parameter values for 
a chosen future year, and deduce the corresponding return level of interest. 


8.3.2 Other Risk Criteria in Non-stationary Settings 


Various authors of recent publications have returned to these notions of return levels 
and periods in the non-stationary setting, with an insistance that the protection of 
installations cannot occur when different values are calculated for each year [792]. 
They, therefore, point to the need to define a notion of risk over the whole lifetime 
(or period of operation) of installations requiring protection. This period is denoted 
[tig T]. 

In this setting, several definitions have been proposed, besides Definition 8.1, and 
occasionally compared [593, 596]. We summarize them in the list below. Suppose we 
are in the functional setting of Sect. 8.1, where y, designates a vector of parameters 
of a non-stationary extreme valued distribution of the maxima in blocks type, and 
recall the notation F(x|y,) for the cumulative distribution function of the extreme 
values of the phenomenon under investigation. The probability of exceeding a level 
x, is then written: 


Pr = 1 — Fly). 
Suppose finally that the data are independent. 


1. Expected Waiting Time (EWT) [575, 682]: the idea is to estimate the mean 
waiting time between successive exceedances of an extreme level. If a value xp, 
is exceeded in a given year tp, the (discretized) time until the next exceedance 
follows a geometric distribution with mass function (density) 


f(t) = P(T= to) = (1— po)! po, 


and the mean waiting time is then 
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co log — 


oo 1 
EWT = EIT] = X fO = D rp [[G- p. 
t=1 


t=1 t'=1 


Estimation of the EWT requires however an estimate of the evolution of the 
exceedance probability up to infinity, which limits the application of this definition 
in practice. Cooley [657] and Salas and Obeysekera [682] proposed stopping the 
sum at a value fmax corresponding to the time at which the probability of exceeding 
xr is 1 for time series with increasing trends or 0 for those with decreasing trends. 

2. Reliability (RE) [682]: this concept is based on the probability of not exceeding the 
value x, over the operating period [71, T2], denoted REFF, 7,,, where ns signifies 
non-stationary. This value can be estimated by 


Tr 
REF, 7, = I] F (xq|W1)- 


t=T, 


3. Design Life Level (DLL) [665]: the principle is the same as above, but the value 
is used to estimate a dimensioning value for a future project, rather than to verify 
the robustness of an existing installation. The value is thus estimated as the level 
whose probability of non-exceedance during [7;, T2] is equal to 1 — 1/T, where 
T is a period of interest, equated to a classical return period in the stationary case. 
Thus, 


Ty 
= : =| 
xph = Frin WT. with Frin) = I] F(x|W). 
t=T, 


4. Equivalent Reliability (ERE) [479]: inspired again by concepts used in the reli- 
ability of structures, [479] has proposed estimating—over the operating period 
of an installation and in the non-stationary context—the level whose probability 
of non-exceedance is the same as that for the stationary context. In the latter, the 
probability of non-exceedance of a given level is the same over each unit of time 
(typically a year), and as extreme value distributions are fitted on independent 
events, its value is 


Ping = 1 — 1/7), 


The level xine fal being looked for thus solves the following equation: 
Th 


ream) 
t=T, 
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5. Average Design Life Level (ADLL) [479]: these authors also define the level 
x Ei exceeded on average 1/T times over the period [7;, 72], estimated by the 


solution of the equation: 


1 


Th 
ADLL 
Ronee PO) =P 


t=T, 


This definition is equivalent to (8.7) if the operating period is equal to T. With this 
notation, the return level x, proposed in Sect. 8.3.1 can be estimated by solving 


T 


D {1- F (xv) } = 1 


t=T,| 


8.4 Case Studies 


8.4.1 Extreme Temperatures 


To illustrate the different approaches, we consider the time series of maximum daily 
temperatures observed in Rennes (Brittany, France) over the period 1951-2013, made 
available by the European Climate Assessment and Dataset project (http://eca.knmi. 
nl). The aim of this project was to construct a database of observations across Europe 
for studying climate change (notably in terms of homogeneity), which has been 
regularly updated ever since. We use this database here to estimate a 30-year return 
level for high temperatures for the period 2014-2043, taking into account observed 
trends. 


8.4.1.1 Analysis of the Time Series 


Before applying extreme value theory, it is necessary to get an idea of the range of 
temperatures seen. The minimum was —7.5 °C on 12 January 1987, and the maximum 
was 39.5 °C on 5 August 2003 (remember that these are daily maxima; the winter of 
1987 was particularly cold!). The highest extreme values occur of course in summer, 
as it can be seen in Table 8.2, so only the temperatures recorded in the months of 
June, July, and August will be taken into account in our estimations. 

Can we then accept that measurements made at each summer (from June to 
August) are equidistributed? Figure 8.3 summarizes the result that linear trends in 
the evolution of the means and maxima can be seen over time. We can test the signifi- 
cance of these trends using a nonparametric Mann-Kendall test (to avoid making any 
hypotheses on the type of distribution) with a 90% confidence level (cf. Sect. 5.4.3.2). 
In both cases, the increasing trend is significant. There is thus a warming trend: the 
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Table 8.2 For each month, the number of annual maxima (first row) over the period of observation, 
two largest values from each year (second row), and three largest values from each year (third row). 
The numbers between brackets are the expected number in the case of an equal distribution across 
all months 


January February March April June 
Max (5.25) | 0 0 0 0 8 
2 largest 0 0 0 0 22 
(10.5) 
3 largest 0 0 0 0 33 
(15.75) 

July August September | October December 
Max (5.25) | 29 24 2 0 0 
2 largest 54 43 5 0 0 
(10.5) 
3 largest 78 64 10 0 0 
(15.75) 


temperatures reached at the end of the period are not similar to those which were 
observed at the beginning of this period. 


8.4.1.2 Parametric Trend Approach 


The first approach consists in fitting an extreme value distribution to the highest 
values while supposing that this distribution’s parameters may be time-dependent 
(excluding the shape parameter £). If we apply the maxima in blocks approach, 
taking into account each summer’s maximum, we test using the likelihood ratio? 
successively the following cases: 


e a linear trend for the position parameter, with a constant scale parameter versus a 
choice of constant parameters: (Mo, H1, 00) against (Ho, oo); 

e a constant position parameter and a linear trend in the scale parameter against a 
choice of constant parameters: (uo, 00, 01) against (uo, 0); 

e if the linear trend is accepted for the position parameter, this is tested against a 
linear trend in each of the parameters: (Mo, (41, 00, 01) against (Ho, H1, 00); 

e ifa linear trend is accepted for the scale parameter, it is tested against a linear trend 
in each of the parameters: (4o, (41, Oo, 01) against (Ho, 00, 01). 


Table 8.3 summarizes the results obtained for the negative log-likelihood of each 
model and the AIC and BIC criteria. All of the methods select a model with a linear 
position parameter and a constant scale parameter (uo, M1, O0). 

If we adopt the POT approach for exceeding thresholds, it is necessary to test the 
hypothesis: “constant threshold and a search for a trend in the intensity À of the Pois- 


2 For example with the /rtest function in the R package extRemes. 
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Fig. 8.3 Evolution of the means (top) and maxima (bottom) of maximum daily temperatures in 
June, July, and August from 1951-2013. The word “significant” in the top-left of each plot means 
that the trends are significant at the 90% level according to Mann-Kendall tests 
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Table 8.3 Comparison of different trend models for the parameters of the GEV distribution fitted 
to the maxima of each summer: negative log-likelihood (n{lh), deviance D compared with the x? 
value for 1 degree of freedom (=3.841) and a type I error rate of 0.05, AIC and BIC criteria. 


(Ho, 90) (0, H1, 90) (Ho, 00, 01) (Ho, H1, 00, 01) 
nllh 136.47 
D/x26%) 0.58 
AIC 282.94 
BIC 293.66 


son process and the scale parameter of the Pareto distribution” (assuming again that 
the scale parameter is constant).* With a threshold of 31.5 °C (corresponding to the 
97% quantile of the summer temperature distribution) considered as suitable accord- 
ing to the constant shape parameter and linearity in the mean of the exceedances 
criteria, the various models can be tested in the same way: 


e constant threshold, linear intensity: (uo, Ao, À) against (uo, Ao); 

e constant threshold, linear scale parameter: (uo, 09, 01) against (uo, oo); 

e linear threshold, linear scale parameter: (uo, u1, 00, 01) against (uo, 4100); the 
linear threshold (obtained using quantile regression) being retained if the intensity 
À of the Poisson process is constant (found by testing (uo, #1, Ao, À1) against 
(uo, u1, Ao)). 


The test results, summarized in Table 8.4, show that the optimal models either 
have a constant threshold, linear Poisson process intensity, and constant Pareto scale 
parameter, or a linear threshold and a constant scale parameter. 

As we saw in Sect. 8.3.1, there are several ways to estimate the return level in the 
presence of identified trends, and several ways to calculate confidence intervals. 

First, let us consider the GEV setting with the model (uo, 41, 00) and the definition 
of the targeted return level as the level reached at least once in expectancy in the next 
30years. This leads to a return level of 39.4°C. The 95% confidence interval can 
be estimated by random-t bootstrap (using the stationary residuals) or parametric 
bootstrap. The results are quite similar: 


e nonparametric bootstrap: [37.6°C 41.5°C]; 
e parametric bootstrap: [37.5°C 41.1 °C]. 


Table 8.5 shows various estimates of the targeted return level depending on the 
approach taken (MAXB or POT) and the definition of the return level used. The 
95% confidence intervals have been estimated using nonparametric bootstrap. The 
competing approaches do not lead to the same values. As shown by [593], evolution 
in the position parameter u(t) of the GEV distribution has more impact on the return 
level than evolution in the intensity À (f) of the Poisson process in the POT approach. 
In the latter, it is the threshold that plays the equivalent role to the position parameter, 


3 This hypothesis is not available in the R package extRemes and was coded specifically for this 
example. 
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Table 8.4 Comparison of various trend models for parameters of the Poisson process and Pareto 
distribution fitted to exceedances of a threshold corresponding to the 97% quantile each summer: 
negative log-likelihood (nllh), deviance D compared with the x? value for 1 degree of freedom 
(=3.841) and a type I error rate of 0.05, AIC and BIC criteria, in the constant and linear threshold 
cases. 


Constant (uo, Io) (uo, Lo, 1) (uo, 60) (uo, 00, 1) 
threshold 

nllh 480.14 472.58 158.06 157.49 
D/x2(5%) 15.13 1.13 

AIC 962.29 949.15 320.12 320.99 
BIC 968.94 962.46 325.20 328.62 
Linear threshold | (ug, u1, Jo) (uo, u1, Io, 11) (ug, U1, 00) (ug, Uy, 00, 01) 
nllh 488.37 487.63 153.05 152.69 
D/x2(5%) 1.478 0.72 

AIC 978.74 979.27 310.10 311.38 
BIC 985.39 992.57 315.23 319.08 


Table 8.5 30-year return levels and 95% confidence intervals obtained by nonparametric bootstrap 
for the two extreme value methods (block maxima MAXB and threshold exceedance POT) for 
different non-stationary models and different definitions of future return levels 


MAXB approach 


POT approach 
(Ho, H1, 90) 
(uo, Lo, T1, o0) (uo, u1, lo, co) 
Once in the next Stationary in | . | . | 
Once in the next Stationary in Stationary in 
30 years 30 years 
394 C 40.0°C 30 years 30 years 30 years 
‘ i 38.4°C 39.2 40.4°C 


[37.6°C 41.5°C] [38.0°C 42.2°C] 
[36.5°C 38.8°C] [36.4°C 40.8°C] [38.2°C 42.5°C] 


and taking into account a variable threshold leads to results that are more coherent 
with an evolution in the position parameter of the GEV distribution. 


8.4.1.3 Mean-Variance Approach 
To apply the alternative approach presented in Sect. 8.2, it is necessary to construct 


the standardized variable by estimating the evolution of the mean and variance of 
summer temperatures nonparametrically. Using local LOESS regression, Fig. 8.4 
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Fig. 8.4 Evolution of the means (top) and standard deviations (bottom) of the maximum daily 
temperature recorded in Rennes from 1951-2013. The smoothing obtained using LOESS is super- 
imposed on the curve obtained by attributing to each day the mean temperature or the standard 
deviation of the temperature of the summer it came from 


displays these trends, where the smoothing parameter has been chosen by optimizing 
a modified partitioned cross-validation criterion proposed in [391]. 

The test for stationarity in the extreme values of the standardized variable sug- 
gests that no significant evolution can be detected in the parameters of the fitted 
extreme value distributions for either the maxima in blocks or threshold exceedance 
approaches. 

The return levels estimated by extrapolating the linear trends seen according to one 
or the other of the previously proposed definitions (level exceeded in expectation at 
least once in the next 30 years, or return level estimated under stationarity in 30 years) 
are summarized in Table 8.6. The confidence intervals are estimated using: 


e nonparametric bootstrap for the level exceeded in expectation once every 30 years; 

e the delta method by adding the contributions of the mean and standard deviation to 
the variance-covariance matrix and the Jacobian matrix in the case of the stationary 
return level in 30 years. 


The results obtained are very similar regardless of the approach and definition 
chosen. The sensitivity seen in the previous section, due to the estimation of trends 
in the parameters of extreme value distributions, is no longer visible. It should be 
noted that this method can be used to estimate future means and variances for more 
distant horizons—such as the end of the 21st century—from climate simulations 
run in international scientific projects supporting the Intergovernmental Panel on 
Climate Change (IPCC). 
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Table 8.6 30-year return levels and 95% confidence intervals for the two proposed definitions, 
estimated using stationarity of the extreme values of the standardized variable and evolution in the 
mean and standard deviation of summer temperatures in terms of the linear trend identified in the 
time series. 


MAXB approach POT approach 

Once in the next Stationary in Once in the next Stationary in 

30 years 30 years 30 years 30 years 

40.0°C 40.2°C 40.2°C 40.4 °C 

[38.5 °C 43.0°C] [38.6 °C 41.8 °C] [38.2 °C 42.1 °C] [39.3 °C 41.4°C] 


Fig. 8.5 The Extremadura 
region, Spain 


8.4.2 Extreme Rainfall 


These approaches can be applied to other variables that potentially have trends. Rain- 
fall in France does not as yet exhibit significant trends, unlike Spain. Let us consider 
a time series of measurements from the Jaraiz de la Vera station in the Caceres 
province, in the North East of the Extremadura region (Fig.8.5), provided by the 
Spanish State Meteorological Agency (AEMET). This data involves daily cumula- 
tive rainfall from 1961-2010, with no missing values. Its quality has been verified by 
Javier Acero, researcher at the University of Extremadura, and the following results 
come from a study run in collaboration with EDF R&D [22]. 


8.4.2.1 Analysis of the Time Series 


The largest observed cumulative daily rainfall was 176mm on 2 February 1972. The 
distribution of the maxima (annual maximum along with two or three of the largest 
values) shows that October to February are the rainiest months (Table 8.7). This part 
of the year was, therefore, chosen to study the extreme values. 
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Table 8.7 Number of yearly maxima (first row), number of two largest values each year (second 
row), and number of three largest values each year (third row) for each month. The numbers in 
brackets are the expected numbers if the results were equidistributed over the set of months 


February 

Max (4.17) 2 
2 largest 18 13 7 5 4 3 
(8.33) 

3 largest 23 19 8 6 7 3 
(12.5) 

July August September | October November | December 

Max (4.17) | 0 0 3 7 11 5 
2 largest 0 0 3 17 15 15 
(8.33) 

3 largest 0 0 7 28 23 26 
(12.5) 


Rain has the particularity of being an intermittent variable. Two (stochastic) pro- 
cesses can be used to characterize it: one is binary (it rains or does not rain), while 
the other is continuous (measuring the actual quantity of rainfall). See Sect. 10.2.2 
for more details. In this setting, the search for trends is more tricky. For example, if 
we look for a trend in the seasonal mean rainfall using the daily data, i.e., including 
the Os (absence of rain) in the mean, the slight downward trend is not significant. 
On the other hand, if rainy days are separated from days with no rainfall, significant 
trends can be seen in the mean, variance, and number of days with rain. Incidentally, 
these are opposing trends in that the mean and variance decrease, while the number 
of days with rain increases (Fig. 8.6). It, therefore, rains less but more often (hence 
the absence of a significant trend when we take into account all of the days). 

For studying the extreme values, we will thus prefer to use the threshold 
exceedance (POT) approach as it can better take into account such intermittent data. 
In effect, threshold exceedances obviously only occur on days with rain, but it is easy 
to pass from an estimated return level using only days with rain to the same return 
level estimated using all days of the season. The difference is in the frequency of 
threshold exceedance, which either involves only days with rain: 

Ny 


he, 


ny; 


where n, is the number of independent exceedances of a threshold u and n, the 
number of days it rained, or instead involves all days in the season: 
Ny 


À = —. 
n 
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Fig. 8.6 Evolution of the seasonal means calculated using all data (top), and seasonal means, 
variances, and numbers of days with rain when considering only days where it rained (bottom) 
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Fig. 8.7 Observed evolution in the amount of rainfall over the threshold (top) and the frequency 
of threshold exceedance (bottom) between October and February 


Independence is dealt with as before for the temperature data; we consider that 
two events are independent if separated by at least one day with rainfall below 
the threshold. This choice has been justified by several tests. As the number of 
exceedances is the same in both cases, we can pass from one approach to the other by 
multiplying by the fraction of days it rained: n, /n. Then, if we examine the evolution 
in exceedances of a high threshold—here set at the 98% quantile of the distribution 
of the values from the season in question (= 52.3 mm), no trend is detected in 
the quantity of rain above the threshold; however, the frequency of exceeding the 
threshold decreases (Fig. 8.7). It is therefore valid to implement methods that can 
deal with trends when examining the extreme values of this time series. Here, we 
will try to estimate a 20-year return level in 2020 (10 years after the end of the time 
series). 
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Fig. 8.8 Variable threshold (top) when keeping independent exceedances, and the optimal evolution 
identified for the scale parameter of the exponential distribution (bottom) 


8.4.2.2 Trends in Extreme Value Distribution Parameters 


A trend showing a decrease in the frequency of threshold exceedance was identified 
in the previous section. As for temperature, identifying this trend can be done using 
likelihood ratio tests. However, as it is a decreasing trend, there is a risk when extrap- 
olating to obtain frequencies of 0 or even less than 0. For this reason, it is judged 
preferable to have a variable threshold ( thus rendering the exceedance frequency sta- 
tionary) and look for a possible trend in the scale parameter of the GPD distribution. 
As an aside, an LRT test can also be used to check whether the GPD distribution can 
be simplified to a standard exponential one (i.e., a shape parameter of § = 0). This 
test indicates that in this case, the shape parameter (-0.09) cannot be significantly (at 
5%) distinguished from 0 (unlike the temperature example, where this test actually 
concludes that the shape parameter is non-zero—details not shown). The test results 
indicate that with a variable threshold estimated by quantile regression (98%) and 
working with all of the days, the scale parameter of the exponential distribution can 
be considered constant. Figure 8.8 illustrates this. 

For this situation, a 20-year return level in 2020 was estimated using the second 
definition (stationary level in 2020) at 123.5mm, for a 95% confidence interval 
[95.5 ; 151.7 mm]. 
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Fig. 8.9 Empirical evolution in the seasonal means and standard deviations of number of days with 
rain (dashed line) and the estimated nonparametric trend (solid line) 


8.4.2.3 Mean-Variance Approach 


So as not to mix up the two processes described in Sect.8.4.2.1 when estimating 
the mean and variance, these two will be estimated using only the non-zero rainfall 
values, in the presence of a supplementary variable giving the number of days of 
rainfall per season. The mean number of days of rainfall per season is only 43.5—far 
from the 151d in total. Note in passing that the probabilistic approximation made if 
we were to use a maxima in blocks approach would be misleading since the maximum 
supposedly taken over 151d is in reality only over 43.5d. This further justifies our 
preference for a threshold exceedance approach in this specific case. 

The threshold is then chosen as the 95% quantile of the set of values from days with 
rain and is equal to 60.2 mm. The shape parameter of the GPD distribution (= —0.03) 
is not significantly different from 0, so the simple exponential distribution will be 
retained. Finally, 101 independent values are found over the threshold. 

Nonparametric trends in the mean and variance are identified in the same way 
as for the temperature data and plotted in Fig. 8.9. These allow us to calculate the 
standardized variable Y, (where r is a reminder that we are only using days on which 
it rained), which we test for stationarity in its extreme values. More precisely, the 
intensity of the Poisson process and the scale parameter of the exponential distribution 
are tested for being constants, following the procedure described in Sect. 8.2.1.2. 
The p-values obtained are 0.48 for the Poisson process intensity and 0.88 for the 
scale parameter of the exponential distribution; these results support the stationary 
hypothesis. 

The future return level can then be estimated using the hereditary property of 
GPD distributions (Proposition 6.3). Indeed, if v is the threshold defined for the 
standardized variable Y,, then Y, > v is equivalent to X, > w, with 


w= Sr: v +m, 
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where m, and s, are respectively the mean and variance at the targeted date. Then, if Y, 
follows a GPD above the threshold v with parameters &, and o, and an intensity à, for 
the Poisson process, X, also follows a GPD above the threshold w with parameters: 


Ew =& = &, (= 0 in our case) 
Ow = Oy: Sr, 
Arw = Av, 


where X,, is the exceedance frequency calculated using only the days on which it 
rained. 

To estimate the return level in 2020, we must estimate the mean and variance 
at that future time, as well as the number of days with rain, to then be able to get 
back to a return level over all days in the season. These estimates are made here 
by extrapolating the linear trends identified earlier in the mean and variance in the 
amount of rainfall (on days with rain), as well as in the number of days with rain 
(alternatively, they could have been obtained using climate simulations). Here we 
find my = 12.9, sr = 16.7, and nb}; = 50.6. 

The return level is then estimated using Formula (6.7): 


XN=W + = [(Wnyan)® — 1] ; 


where N is the required return period (20 years here), n, the number of values per 
year (per season here, i.e., 151), and à, the exceedance frequency of the threshold 
w when considering all of the days. The latter is calculated using: 


nb 
Aw = ARw A 
ny 


We implement a nonparametric bootstrap method consisting of (in this specific case) 
repeating the following steps a great number of times: 


1. Generate a sequence of the number of days with rain each season by adding 
a random perturbation around the linear trend identified. 

2. Associate with each day of rain a quantity estimated using a bootstrap 
sample by blocks (of 10d to control temporal correlation) of Y, and non- 
parametric trends in the mean and variance. 

3. Deduce a 20-year return level for 2020 using the same method used for the 
original sample. 


This procedure allows us to obtain a distribution for 20-year return levels in 2020 
with which we can deduce a confidence interval for the return level estimator. The 
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estimate of this return level is 107mm, and the 95% confidence interval is [81.3 
maxima 117.7 mm]. This interval is much smaller than the one obtained earlier (with 
trends in the distributions’ parameters). The impact of the decrease in the mean and— 
especially—the variance, overpowers the increase in the number of days with rain, 
and more strongly influences the estimated future return level than the decreasing 
value of the threshold. 

These examples show that the estimation of return levels in the non-stationary set- 
ting cannot take advantage of solid theoretical underpinnings, and that different ways 
to treat the same problem can least to fairly different estimates, even for relatively 
non-extreme levels (here, 20 or 30 years). It, therefore, is illusory to hope to esti- 
mate, in this context, extremely rare events (e.g., those associated with exceedance 
probabilities of 1074); this reality has been recognized by certain industrial actors 
like WENRA (the Western European Nuclear Regulators Association). 


Chapter 9 A) 
Multivariate Extreme Value Theory: gas 
Practice and Limits 


Anne Dutfoy 


Abstract This chapter presents the most recent developments in extreme value the- 
ory in cases of joint, or multivariate, hazards. This type of phenomenon is increasingly 
being studied, as it can be more destructive than a single hazard. Introducing, in par- 
ticular, the notion of copula, this chapter takes up the notations and main concepts 
introduced in Chap. 6. 


9.1 Introduction 


9.1.1 Why Multivariate Extreme Values? 


Univariate extreme value analyses concern the probabilistic modeling of a phe- 
nomenon in its extreme values. However, especially in the natural hazards setting, 
natural processes are often correlated, and their extreme values arrive not in isolation 
but in clusters. For example, flood season is often also storm season, while heatwaves 
occur in summer at the same time that rivers are at their lowest (baseflow) levels. 

Detecting the potential for groups of extreme values to occur is a fundamental 
problem in industrial dimensioning and civil engineering. A tragic illustration of this 
was the nuclear accident at Fukushima in March 2011, aconsequence of the combined 
effects of an earthquake (which destroyed the nuclear power plant’s electricity cables 
as well as several paths of communication) and a tsunami generated by the same 
earthquake (which flooded the emergency power sources). Even though the structures 
in question were conceived to resist each of these events independently, it might not 
be able to withstand the simultaneity of those events. 

Thus, looking at each natural hazard in isolation is not always sufficient, and it may 
be necessary to jointly analyze the extreme values of several phenomena, where the 
choice of relevant combinations (i.e., those which together could potentially impact 
a structure’s robustness) must be made by experts in the domain. 
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Consider, for example, the probability that extreme values from two natural haz- 
ards occur at the same time. When these processes are independent, i.e., when the 
occurrence of one has no influence on the occurrence of the other, the probability of 
both producing extreme values is simply the product of the probabilities. Formally, if 
A and B are the extreme events under consideration, then P(A N B) = P(A) P (B). 
In such cases, multivariate analysis reduces to several univariate analyses in which 
the goal is to calculate the marginal probabilities P (A) and P (B). 

In contrast, when natural hazard processes are correlated, the combined prob- 
ability cannot be determined from the two probabilities alone, and P (A N B) 4 
P (A) P (B). In particular, if the dependence is such that the occurrence of extreme 
event A leads to or favors that of B, then the combined probability is larger than 
the product of the marginal ones: P (A N B) > P(A) P (B). Thus, supposing that 
these extreme events were independent would lead to a non-conservative estimate of 
the probability of the events occurring together, namely, this probability would be 
underestimated. 

This multivariate analysis which we allude to is constructed on a base of results 
pulled from the rich theory of multivariate extreme values. With such calculations of 
the combined probability of extreme events occurring for different levels of extreme- 
ness in hand, it is possible to construct mappings of these probabilities. With the help 
of appropriate visualizations, these tools can help get an overall idea of the occurrence 
of groups of extreme events for different levels of risk. 

Joint analysis also allows us to model the joint probabilistic behavior of extreme- 
valued hazards. The multivariate probability distribution thus obtained can then be 
used in multiple ways. Notably, we can use it to model conditional extreme event 
scenarios and determine the distribution of a phenomenon given that other phenom- 
ena have exceeded some fixed extreme level. For example, what is the distribution 
of a river’s flow given that a violent storm is taking place? Or given that a river is in 
flood, what is the distribution of flows in a neighboring river in the same watershed? 


9.1.2 Difficulties in the Multivariate Setting 


Numerous concepts in the univariate setting lose their meaning when transferred to 
the multivariate one. It begins with the question of what being a multivariate extreme 
value even means. Is it when all of its components are extreme values, i.e., when 
all of the hazards have extreme values simultaneously? Or is it when at least one 
component is extreme, i.e., when at least one process has an extreme value, but not 
necessarily all of them? 

In what follows, we will see that models for multivariate extreme value the- 
ory describe the probabilistic behavior of phenomena which are all simultaneously 
extreme valued. On the other hand, model inference takes place jointly using all of 
the data. According to the number of its extreme-valued components, to each datum 
is associated a different weight. 
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This question naturally leads to another: how should the maximum of a set of 
multivariate data points be defined? Is it the data point containing the most extreme 
value (no matter the component) in all of the data? Or rather, is it some mathemat- 
ical construct, such as, for instance, the vector containing the maximum from each 
component? 

It turns out that the theory of multivariate extreme values deals with the component- 
wise vector of maxima M, over periods of length n. Interpreting and working with 
such vectors is a tricky business. First, these vectors M, are not actual observed data 
points. Second, the event {M,, > x}, where comparisons are component-wise and x 
is a vector of extreme-valued thresholds, means that all of the extreme-valued events 
occurred over the period in question, but not necessarily at the same time! Thus, 
when jointly analyzing a flood and a storm, the limit distribution of M,, allows us 
to calculate the probability that both extreme events occurred during fall, without 
making clear whether the events happened the same day or a month apart, two situa- 
tions which do not at all correspond to the same overall severity. As a consequence, 
modeling the tail of the multivariate distribution will be the focus of study, rather 
than modeling M,. 

Following the same thread, the notion of return period of a multivariate event 
must also be clarified. Is it the mean waiting time between arrivals in a domain 
defined in terms of extreme-valued thresholds? Or rather, defined by the quantile of 
the multivariate distribution associated with the event in question? 

De facto, the question of how to define what the level of a given return period 
arises, as the relationship between levels and return periods is no longer bijective: 
a given return period may correspond to a hypersurface on which—using an addi- 
tional criterion—we must select a particular point, which will be the dimensioning 
combination of the protection structure or civil work. 

As brought up above, the simultaneous study of several extreme-valued haz- 
ards brings the additional notion of dependence to the table. It will be necessary to 
study, on top of the marginal behavior of each component, the dependence structure 
between the extreme values of different phenomena. Several types of dependence are 
possible. Between total independence and total dependence lies asymptotic indepen- 
dence, an intermediate situation in which real dependence diminishes for more and 
more extreme values. The variables can be either positively associated or negatively 
associated with each other. 

Finally, multivariate analyses are based on multivariate observed values, 1.e., 
simultaneous measurements of several phenomena, obtained over a time period that 
is as long as possible. In practice, however, data for different phenomena are obtained 
independently of each other by groups such as Météo-France, but also EDF—which 
itself records the values of certain phenomena. These data sources are then brought 
together to form a multivariate dataset, which is only possible if measurements were 
made at the same frequency (daily, hourly, etc.) over the same period. In practice, 
joint analyses are restricted to the period in which data is available from all sources, 
which can be a real problem if the period in question turns out to be short. 
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9.1.3 Chapter Contents 


In this chapter, we start by explaining with precision the most important notion in 
multivariate analyses: dependence between processes with extreme values. Then, 
we recall, without going into detail, the main fundamental theorems of multivariate 
extreme value theory. In [65, 173], readers can find an in-depth exposition of this 
theory. 

The first structural theorem concerns the limit distribution of multivariate max- 
ima and defines multivariate extreme value distributions (GEVM), whose marginal 
distributions are univariate GEV distributions linked by a extreme value copula. The 
second theorem provides an approximation of the tail distribution of the multivariate 
distribution of the phenomena. This approximation can either be deduced from the 
limit distribution of the multivariate maxima, or come from some parametric model. 
We will describe Ledford and Tawn’s model, which allows all types of asymptotic 
dependence to be modeled. 

A section is then dedicated to multivariate inference of both the joint distribu- 
tion of the phenomena and the probability of combinations of extreme-valued events 
occurring. Notably, we make clear different possible strategies for inferring multi- 
variate distributions, e.g., should inference occur simultaneously or successively on 
the marginal distributions and the dependence structure? 

Lastly, we will give several examples showing how to understand and exploit 
results obtained by multivariate extreme value analyses, including modeling condi- 
tional scenarios, iso-quantile lines, iso-frequency contours, and bounding the prob- 
ability of multiple extreme values occurring simultaneously. 

As much as possible, we will illustrate the theoretical results using French real 
data: 


e Case 1: conjunction of flooding of the Loire and Vienne rivers at their confluence. 
These two rivers are found in the same watershed. The data is daily maximum 
flows reported jointly from 1958 to 2000 (source: HYDRO databank”). 

e Case 2: conjunction of a windstorm and extreme cold weather in the central region 
of France. The data are daily maximum wind speeds and minimum air tempera- 
tures, jointly recorded by Météo-France from 1971 to 2008. 

e Case 3: conjunction of baseflow of the Garonne river and extreme cold weather. 
The data are daily measures of the maximum flow and minimum temperature, 
jointly recorded by Météo-France from 1967 to 2012. 

e Case 4: conjunction of flooding of the Rhone River and windstorms. The data 
are daily maximal flows and maximal wind speeds jointly recorded from 1920 to 
2011. 
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Remark 9.10 In addition, for French-speaking readers interested in technical details 
of the theoretical results shown and used in this chapter, we suggest reading [154] and 
[285], which deal, respectively, with multivariate modeling (in general, for studying 
risk) and multivariate extreme values. 


9.2 Nature of Dependence 
9.2.1 Asymptotic Independence 


Dependence between several random variables can be modeled by a function from 
IR? — [0, 1] called a copula. This is an essential ingredient of any multivariate 
distribution, as made clear by Sklar (1959) in the following theorem. 


Theorem 9.9 (Sklar [704]) Let F be the multivariate cumulative distribution func- 
tion of a random vector of dimension d: X = (X,,..., Xa). Then there exists a 
function C : IR¢ — [0, 1] called a copula defined for all x € IR4: 


F(x, ..., Xd) = C(F\ (x1), ss Fy(xa)). 


If the F\,..., Fa are continuous, the copula C is unique; otherwise, it is uniquely 
determined on the image product space I, (F\) x --- X 15 (Fa). 


Any property of a random vector relating only to dependence corresponds to a 
property of the copula function alone. For example, for independent random variables 
X;, the copula is the product copula C (u1, ..., uqg) = T4 ui. This corresponds to 
saying that the cumulative distribution function F of the vector X is the product of 
the marginal cumulative distribution functions F;: F (x1, ..., Xa) = ne. F(x;). 

We now introduce the notion of asymptotic dependence of arandom d-dimensional 
vector. This definition requires only the copula Cp of the distribution of the random 
vector. 


Definition 9.18 (Copula with asymptotic independence (d > 2)) A multivariate dis- 
tribution F with copula Cr is asymptotically independent if Cp satisfies, for all 
u € [0, 1]f, 

lim CR”, sou = Uj... Ud. 


Asymptotic independence of a random vector is equivalent to pairwise asymptotic 
independence of its components [84]. This result allows us to restrict analyses to 
asymptotic independence in the bivariate case, for which we now give a definition 
easier to understand. 


Definition 9.19 (Asymptotic independence (d = 2)) The scalar variables X; and X2 
with marginal distributions F; and F3 are said to be asymptotically independent if 
and only if: 
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lim P(X, > F,'(u)| X2 > Fy'(u)) =0. (9.1) 
In other words, the probability that the variable X; exceeds its order u quantile 
(the term Fy i (u)) given that the variable X, has already exceeded its order u quan- 
tile (the term F, : (u)) tends to 0 as we consider higher and higher quantiles u. This 
relationship implies that the probability of the simultaneous occurrence of two pro- 
cesses’ extreme values is negligible with respect to the marginal probability of one 
of them occurring. 
There are several ways to check whether (9.1) holds. 


e The first way is to see whether the variables are completely independent, because, 
in this case, 


P(X, > F,'(u), X2 > Fy'(u)) = (1—u}, 
and thus 
P(X, > F,'(u)|X2 > Fy'(u)) =1-u. 
Hence, complete independence is a special type of asymptotic independence. 


The second way is more subtle: when the variables are dependent but with an 
evanescent dependence as u tends to 1, i.e., when! 


P(X; > F,'(u), X2 > Fyw) = O(1— uw), 
where 0 < 7 < 1. Then, 
=] —1 1-1 
P(X, > FW) |X > F wW) = Au). 


We further distinguish between three sub-cases: 


p< 


. 0.5 <n < 1: here we say there is positive association. The combination of 
extreme events {x 1> Fi ‘(u)} and [x >F, !(u)} happens more often than 
when the variables are totally independent. Since 1 < 1/n < 2, we have 


P(X, > F,'(u), X2 > F; (u)) > P(X, > FT'(u)) P (X2 > BW). 
2. n = 0.5 corresponds to exact asymptotic independence as described earlier, which 
means we can write 


P(X, > F,'(u), X > Fy) = P(X > FT'(u)) P (X2 > BW). 


' Recall that O(.) means is of the order of. 
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3. 0< 7 < 0.5: here we say there is negative association. The combination of 
extreme events {x > F,'qw)} and {Xo > Fy'(u)} happens less often than 
when the variables are totally independent. Since 1/n > 2, we have 


P(X, > Fy '(u), X2 > Fz '(u)) < P (X, > Fy) |W) P (X2 > FW). 


These considerations suggest that making the hypothesis that the phenomena 
are independent leads to overestimation of the probability of simultaneous extreme 
events in the case of negatively associated asymptotic independence, or underesti- 
mation in the case of positively associated asymptotic independence. 

It is the latter that engineers working on risks to structures fear, since the indepen- 
dence hypothesis puts in danger their claims to the robustness of their constructions. 
On the other hand, in the former case, the independence hypothesis—which strongly 
simplifies analyses—is not detrimental to their claims of robustness since, in real- 
ity, the extreme events occur simultaneously with a lower probability than the one 
obtained under the independence hypothesis. 


9.2.2 Data Visualization 


Visualization of multivariate data in various spaces is a qualitative technique to help 
understand the behavior of physical processes in the extreme values setting. The type 
of dependence between phenomena may be detectable in this way. 

Types of visualization (used for each of our examples) include those in the phys- 
ical (original sample) space, in the rank space, in the Fréchet space, and in the 
pseudo-polar space. Our examples will help illustrate the different types of asymp- 
totic dependence: 


e Case 1: asymptotic dependence of the two rivers when in flood; 

e Case 2: negatively associated asymptotic independence of storms and extreme 
cold; 

e Case 3: positively associated asymptotic independence of river baseflow and 
extreme cold; 

e Case 4: exact independence between extreme flood and storm events. 


9.2.2.1 Plotting Points in the Physical Space 


The physical space, or original sample space, corresponds to that of the raw data in 
its given units. This is both the most natural yet hardest space in which to interpret 
the data. In effect, the distribution of the points in this space depends on the data’s 
units as well as its marginal cumulative distribution functions. It is thus very difficult 
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to estimate how extreme valued the points actually are on this graph. The rank space 
is preferable to this, bringing with it more directly interpretable information. 

Figures 9.1, 9.2, 9.3 , 9.4 illustrate, in their first columns, the data obtained for 
each case study. 


9.2.2.2 Plotting Points in the Rank Space 


Recall that the rank of a observed data point corresponds to its position in the list of 
the points after ordering them. 
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Fig.9.1 Case 1: asymptotic dependence for flow measurements. Plotting the data (a) in the physical 
space, (b) in the rank space, (c) in the Fréchet space. The x-axis corresponds to the flow of the Loire 
River, the y-axis to that of the Vienne River. Values in the physical space are in m/s 
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Fig. 9.2 Case 2: negatively associated asymptotic independence for high wind speed and low- 
temperature measurements. Plotting the data (a) in the physical space, (b) in the rank space, (c) in 
the Fréchet space. The x-axis corresponds to wind speeds (m/s in (a)), and the y-axis to temperatures 


CC in (a)) 


Definition 9.20 (Rank space) Let X be a random d-dimensional vector. Let F; be 
the marginal cumulative distribution function of the distribution associated with the 


i-th component of X. The rank space of a multidimensional sample x1,...,x, of 
draws of X is the space whose points are vectors of (normalized) rank statistics 
S1, ..., Sn defined for each dimension i and for 1 < k < n by 


Sik = figl n, 


where 
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Fig. 9.3 Case 3: positively associated asymptotic independence for low-flow and low-temperature 


measurements. Plotting the data (a) in the physical space, (b) in the rank space, (c) in the Fréchet 
space. The x-axis corresponds to flows (m3 /s in (a)), and the y-axis to the temperature (°C in (a)) 


n 
rik = 1+ D Lock — Xj) 
j=l 


defines the rank (position) of the i-th component of the k-th vector in its marginal 
sample. 


Each vector of normalized ranks s1, ..., Sn is, therefore, found in [0, 1]¢, and the 
raw data is transported to this space by the component-wise mapping: 


F'(X), ..., F7 (X0), 
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Fig. 9.4 Case 4: total independence for high-flow and high wind speed measurements. Plotting the 
data (a) in the physical space, (b) in the rank space, (c) in the Fréchet space. The x-axis corresponds 
to flows (m? /s in (a)), and the y-axis to wind speeds (m/s in (a)) 


as each function F~!(.) is increasing. This mapping does not alter the dependence 
structure in the data. 

The positioning of data in this new space is much easier to interpret than in 
the physical coordinate space since the mapped data s,,..., 5, all follow uniform 
distributions on [0, 1], given that the effect of the marginal distributions F; has been 
removed. This rank space makes it easy to spot the dependence that exists between 
the extreme values of different variables. 

Figures 9.1, 9.2, 9.3 , 9.4 show, in their second columns, the results for each 
example. The zones of interest appear to be top-right for Cases 1 and 4, bottom right 
for Case 2, and bottom left for Case 3. 


202 A. Dutfoy 


These graphs show that asymptotic dependence corresponds to a point cloud with 
clearly visible trends in it—which persist for high quantiles. Negatively associated 
asymptotic independence corresponds to a point cloud that thins out for the most 
extreme quantiles. Positively associated asymptotic independence corresponds to a 
point cloud which becomes slightly denser when moving toward the more extreme 
quantiles. Finally, total independence corresponds to a point cloud spread out uni- 
formly in the extreme quantiles zone. 


9.2.2.3 Plotting Points in the Fréchet Space 


Here, the data undergoes a transform so that each component follows a standard 
Fréchet distribution with a cumulative distribution function Fp defined by Fr(y) = 
e71, y > 0. Therefore, this transform is given by, in the bidimensional case X = 
(X1, X2), 


(Zi, Z2) = (Fz! o Fi(X1), U7! o P(X))), (9.2) 


where F and F, are estimated using the empirical cumulative distribution functions. 
Here again, this transform operates component-wise with an increasing function that 
again does not alter dependence structure in the data. 

So that the extreme values studies correspond to high quantiles of the Fréchet 
distribution, the data are flipped by artificially multiplying them by —1 when the 
extreme values of interest correspond to lows rather than highs. This is what happens 
for the temperature data in Cases 2 and 3, and the flow data in Case 3. 

Figures 9.1, 9.2, 9.3 , 9.4 show, in their third component, the resulting plots. Note 
that only the plot showing dependence in the conjunction between the two flood 
events has points outside of the axes shown. 


9.2.2.4 Plotting Points in the Pseudo-polar Coordinate Space 


Here we perform the Fréchet change of variable as in (9.2), followed by the transform: 


Zi 
R =|7Z Z2, ———— |. . 
(R, W) ( 1 + 22, a) (9.3) 


The random variable R is the sum of two positive variables. Bivariate data points 
corresponding to large values of R are, therefore, those for which at least one of 
them has a high rank. As for W, it takes values in [0, 1]. The event {W = 1/2} brings 
together all bivariate data points in which both of the components have the same 
rank, whatever it may be. 

To study combined occurrences of extreme values, we are thus interested in data 
points for which R is large and W is close to 1/2. More precisely, we select the 
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Fig. 9.5 Histograms of data in pseudo-polar coordinates. Case 1 (top): asymptotic dependence 
between flow measurements. Case 2 (bottom): negatively associated asymptotic independence 
between strong wind and cold temperature measurements 


bivariate data points corresponding to the largest values of R. We then plot the 
histogram of values of W for these selected points in order to analyze its central part. 

Figure 9.5 shows the histogram of W obtained in the cases of asymptotic depen- 
dence (Case 1) and negatively associated asymptotic independence (Case 2). For 
Case 1, the histogram has a non-negligible set of values around w = 1/2, thus show- 
ing that the simultaneous occurrence of extreme values is not negligible. For Case 2, 
the histogram has large peaks around 0 and 1, and very few values around w = 1/2, 
showing that here extreme values essentially occur in isolation. 
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9.2.3 Measuring and Testing Dependence 


Complementing the qualitative analysis of dependence, numerous quantitative tools 
are available to help better characterize the type and strength of dependence, inclu- 
ding the scalar coefficients x and x, which we present here. Interested readers can 
also find an innovative measure of asymptotic independence in [738]. 


9.2.3.1 Coefficients and Functions of Asymptotic Dependence 
The asymptotic dependence coefficient x, with 0 < x < 1, is defined by 


x = lim P(X, > F;'(u)|X2> FW). 


u— 17 


Therefore, by definition, asymptotic independence is equivalent to x = 0. 
This x can also be obtained in the limit as u — 17 of the value of the function 
u — x(u) defined for each u € [0, 1] as 


log P (Fy, (X1) < u, FX) < u) 


xu)=2 
logP (Fx,(X1) < u) 
log Cr (u, 
zg SEE (9.4) 
logu 
In effect, it can be shown that 
X= lim x(u). (9.5) 


We also introduce the x coefficient, with —1 < x < 1, defined by 


where the function u —> x (u) is defined for each u € [0, 1] by 


2logP (Fı(Xı) > u) 
log P (F\(X1) > u, Fo(X2) > u) 
2log(1 — u) 
~ logCr(u,u) 


xu) = 


1. 


The form the dependence takes can be analyzed as follows: 


(i) For asymptotically dependent variables, x = 1 and 0 < x < 1. In this case, x 
measures the strength of the dependence. 
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Quantile Quantile 


(a) (b) 


Fig. 9.6 Case 1: asymptotic dependence of flow measurements; (a) u — x (u), and (b): u > x(u) 


(ii) For asymptotically independent variables, x = 0 and —1 < x < 1; the greater 
the value of x, the stronger the dependence. A negative association corresponds 
to x < 0, a positive one to x > 0. 


Figures 9.6, 9.7, 9.8, and 9.9 illustrate the behavior of these coefficients for each 
example, showing plots of u — x(u) and u — x (u). The limiting values seen as u 
tends to | leading to the following conclusions: 


Quantile Quantile 


(a) (b) 


Fig. 9.7 Case 2: negatively associated asymptotic independence of strong wind and low- 
temperature measurements; (a) u —> x (u), and (b): u > X% (u) 
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(a) (b) 


Fig. 9.8 Case 3: positively associated asymptotic independence of low-flow and low-temperature 
measurements; (a): u > x (u), and (b): u > x (u) 


Ch 


(a) (b) 


Fig. 9.9 Case 4: total independence of high-flow and strong wind measurements; (a): u — x (u), 
and (b): u > x (u) 


e Case 2 corresponds to negatively associated asymptotic independence: (x, x) = 
(1, —0.4). 

e Case 3 corresponds to positively associated asymptotic independence: (x, x) = 
(0, 0.1). The level of dependence is thus very low. 

e Case 4 corresponds to total independence: (x, x) = (0, 0). 


The graph for Case 1 is not conclusive and decisions will have to be made using 
other statistical tools. 
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9.2.3.2 Detecting Asymptotic Independence 


Detecting cases of asymptotic independence is crucial as this prohibits the subsequent 
use of certain modeling methods, as we will see in the following. This is the case, 
for instance, when the modeling is based on the limit distribution of multivariate 
maxima. 

Numerous tests for detecting asymptotic independence can be found in the litera- 
ture. Readers can turn to [401, 750, 751, 801] for technical details and comparisons. 
Recently, [143] has comprehensively reviewed the state of the art when it comes 
to these tests, which include—among others—score tests, the gamma test, and the 
madogram test. We now describe the idea behind one of the most powerful such tests, 
proposed by Ledford and Tawn [473]. 

Ledford and Tawn [473] have introduced a new scalar coefficient 7 known as the 
tail dependence coefficient, with 0 < n < 1, described in more detail in Sect. 9.3.4. 
This scalar value allows us to test for asymptotic independence in data. With a weak 
hypothesis on the tail of the multivariate distribution F, Ledford and Tawn show 
that asymptotic dependence implies 7 = 1. They also suggest testing for asymptotic 
dependence by testing the simple hypothesis Hp = {n = 1} against the compos- 
ite alternative Hı = {0 < n < 1}. Theoretically, Ho is not equivalent to asymptotic 
dependence, but in practice we can equate the two [65]. Several estimators of 7 can 
be found in the literature, some of which are described in Sect. 9.3.4. 

Figure 9.10 illustrates the results obtained on our examples. For Case 1, the deci- 
sion made is to conserve the hypothesis Ho since although the estimation of 7 is 
more or less constant and below 1, the 95% confidence interval always contains 1. 
In contrast, for Cases 2 and 3, the decision is made to reject Hp. Although again 
n is more or less constant and below 1, the 95% confidence intervals for various 
thresholds do not contain the value 1. 


9.3 Fundamental Results 


9.3.1 Partial and Marginal Ordering 


The vectorial order relation used in the analysis of multivariate extreme values is 
marginal ordering, the definition of which is as follows. 


Definition 9.21 (Marginal ordering) Let (x, y) € IR“. We say that x is less than or 
equal to y, and write x < y, if and only if: 


x<yoousy, Vi=l,...,d. 


This order relation is a partial one as it cannot order all pairs of vectors; for each 
x, we can find some y such that neither x < y nor y < xis true. As for the maximum, 
it is defined component-wise: 
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Fig. 9.10 Estimation of the shape parameter 7 of the GPD of T for several thresholds. Case 1 (top): 
Ho is not rejected. Cases 2 and 3 (center and bottom): H is rejected 
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Definition 9.22 (Maximum of two vectors) Let x, y € IR? x IR. The maximum of 
x and y, written x V y, is defined as 


XVY= (XV y1,...,Xa V Ya). 


Note that x V y may be equal to neither x nor y. Thus, in the context of n data points 
(X1,..., Xn), the maximum M, = x; V --- V x, may not be equal to any of them. 


9.3.2 Limit Distributions of Multivariate Maxima 


Theorem 9.10 extends Leadbetter’s univariate result from 1974 [461]. It proves that a 
limit distribution for multivariate maxima exists for multivariate stationary processes 
under a condition D(u,) bounding the temporal dependence in the process, where 
u, is a high threshold. Details of this highly technical condition are not given here, 
but can be found in [65, p 420], as well as in Definition 6.10 in the present book for 
the univariate case only. 


Theorem 9.10 Let X be a d-dimensional stationary process with marginal dis- 
tribution F. We say that F is in the domain of attraction of a limit distribution of 
multivariate maxima G (GEV), written F € Y(G), if there exist sequences (a, ), > 0 
and (b,), in IR, and a nondegenerate distribution G, such that 


P (a, (My — bn) <x) — G(x), n — +00 


and if condition D(u,) is satisfied for u, = a,x + b, for all x such that G(x) > 0. 


Considering this for each marginal distribution, F € Z(G) implies that F; € 
QY(G;), which means that each marginal distribution G; is a limit distribution of 
univariate maxima (GEV) of the type Fréchet, Weibull, or Gumbel. 

The limit distribution of the maxima of G has the following characteristics: 


e Gis a max-stable distribution, meaning that it is in its own domain of attraction 
(cf. Definition 6.5); 

e G is a max-infinitely divisible distribution, meaning that for each k € N, G 
remains a multivariate extreme value distribution. 


1/k 


The dependence structure of the limit distribution of the maxima is known thanks 
to the help of the following result. 


Definition 9.23 (Extreme value copulas) If CF is the copula of F, then F € Z(G) 
implies that for each u € [0, 1): 


Co(u) = lim CO il) 


where CG is the copula of G. 
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We can show that for each u for which each component u; is close to 1, the 
max-stability of G implies that 


Cru) ~ Ccu). (9.6) 


Thus, the tail of the copula of F is that of the limit distribution of the maxima G 
when F € 2(G). 


9.3.2.1 Tail Dependence Function 


The properties of the distribution G also imply that it can be defined in the following 
way: for each x € IR“, 


G(x) = exp [-£(- log Gi (x1), ..., — log Ga(xa))]. 
where £ : R? — R is the tail dependence function satisfying: 


Vs > 0, &(s.) = s£(.), 
Vi=1...d,l(e;) = 1, 
and Vv > 0,viV...V va < (V) < vi +... + va. 


The function £ is defined for each v € IR? by 
Í VI Vd 
e(v) = lim P (FX) ST E E O et =. 
t>+00 t t 


The tail dependence function only provides information about the dependence 
structure of G, and is linked to the copula Cg as follows: 


Ca(u) = exp [—£(- logu;,..., —log ua) | . 


Remark 9.11 Unlike the univariate case where the class of limit distributions of 
maxima is a GEV distribution parameterized by three scalars (u, o, £), the class of 
limit distributions of multivariate maxima has no finite parameterization since it is 
indexed by the tail dependence function £. 


We now give the expression for £ in the limit cases of dependence. 


(i) The case where £(v) = vı V ... V vg corresponds to Cg = u1 A ++- A ug (where 
A = min). This characterizes total asymptotic dependence: components are 
connected via a deterministic formula in their extreme values. 

(ii) The case where €(v) = vı +...+ va corresponds to Cg(u) = II‘ u;, which 
characterizes total independence. 
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(iii) For intermediate cases, [65] provides a detailed list of models for the tail depen- 


dence function £. For a well-chosen set of parameters, these models allow all 
types of asymptotic dependence to be modeled. 


9.3.2.2 Pickands’ Function 
Much attention has been paid to the two-dimensional case. In this setting we prefer, 
in place of £, Pickands’ dependence function, denoted A, which is defined on [0, 1], 
takes values in [1/2, 1], and is linked to @ by the formula 

A(t)=£(1—t,t) Vt € [0,1], 
or equivalently, 


v2 


Lvi, V2) = Vi + v2)A ( =) Vi, v2) > 0. 


VI 


Pickands’ function is convex and satisfies max{t, (1 — t)} < A(t) < 1 fort € 
[0, 1] [616]. The case A(t) = 1 corresponds to total asymptotic independence, and 
A(t) = (1 — t) V t, to total asymptotic dependence. However, the bounding and con- 
vexity constraints on A imply that 


At) = 1 4> A(1/2)=1 
and 
A(t) =(1-—t) vt => A(1/2) = 1/2. 
Thus, the real number A(1/2) alone can characterize the type of dependence seen. 


Finally, G and Cg can be written as function of A for each x = (x1, x2) € IR? and 
u = (u1, u2) € [0, 1]: 


log (G 
G(x1, X2) = exp {tog [Gi (x1)G2(x2)] A ( og (G2(x2)) )} 


log (G1(x1)G2(x2)) 


and 


Cc (uj, u2) = exp {logtnua) (ey . (9.7) 
log(uiu) 


Specialists in this domain have proposed several parametric (e.g., the logistic 
model in (9.16)) and nonparametric models for A(t), and numerous authors have 
provided statistical estimators of this function. See [125] and [363] for the most 
recent results and bibliographies on the subject. 
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9.3.3 Limitations of GEV Modeling 


In this section, we show that the limit distributions of multivariate maxima cannot 
model all types of dependence in extreme values, notably certain types of asymptotic 
independence. 


Remark 9.12 Formula (9.7) implies that any copula Cg with a bivariate maxima 
distribution satisfies 


Co(u, u) = u?A4/?, (9.8) 


Formulas (9.4) and (9.6) mean that we have x = 2[1 — A(1/2)]. Thus, the only 
way to model asymptotic independence is to set A(1/2) = 1, which corresponds to 
modeling total independence and thus forcing the limit distribution of the maxima 
to be of the form: G (u1, u2) = Gy (u1)G2(u2). 

De facto, the only multivariate GEV distribution that models asymptotic inde- 
pendence is the total independence one. Thus, GEV distributions cannot be used to 
model positively or negatively associated asymptotic independence. 

The use of the limit distribution of multivariate maxima to estimate the probability 
of simultaneous exceedances of high thresholds can have fatal consequences if the 
marginal processes are asymptotically independent. In effect, the independent copula 
underestimates the weight of simultaneous extreme values in the positive association 
case, and overestimates it in the negative one. 

In the case of asymptotic independence, it is, therefore, necessary to model the 
tail of the distribution F with the help of models other than those coming from 
the limit distributions of multivariate maxima. For instance, the Ledford and Tawn 
model [473] is an alternative which can model asymptotic dependence just as well 
as asymptotic independence. 


9.3.4 Tails of Multivariate Distributions 


9.3.4.1 Non-temporally Correlated Process 
If X is a process with no temporal correlation, then 
P (a, "M, = b,) < x) = F” (anx F bn), 


and under the hypotheses for the existence of the nondegenerate limit distribution of 
normalized maxima G, Theorem 9.10 implies that, for large n, 


F"(x) ~ G(x), Vxe R, 
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where the constants (a,, b,) have been folded into the parameters of G due to its 
max-stability. 

With the latter formula, we can approximate the tail of the distribution F, valid 
for x; such that F;(x;) > 1: 


F(x) ~ exp {—€ (— log Fi (x1), ..., — log Fu(xa))}, (9.9) 


where the F; are the marginal distributions of the process. 
In contrast, the relationship between the multivariate distributions F and G is 
difficult to obtain when the process exhibits temporal correlation. 


9.3.4.2 Time-Correlated Processes 


In the case where each marginal process X; has temporal correlation characterized 
by an extremal index 6; (see Definition 6.9), we have that 


ny 
Gr = Fy. 


As the indices 6; are not necessarily all equal, the link between G and F is not self- 
evident. Nandagopalan [545] has proposed extending the definition of the extremal 
index 0 given in the univariate context to the multivariate one by defining the multi- 
variate extremal index 


_ logG {Gr (e-™),..., Gz! (ei) 


0(v) = 2 
log G {Gy (e"),..., Ge") 


, Wve (0, co). 


It was necessary to wait until 2005 for Beirlant [65] and Martins and Ferreira 
[506] to propose theoretically motivated estimators of the function 6(v), despite a 
brave earlier attempt by [709]. A nonparametric solution to the inherently difficult 
inference of the multivariate extremal index function [247] was nevertheless recently 
provided by [653]. 

Note, however, that the tail model obtained from a GEV model of the limit dis- 
tribution of maxima can only model total independence or total dependence (cf. 
Sect. 9.3.3). Thus, for a process with temporal correlation, just as in the asymptotic 
independence case, it is necessary to model the tail of F differently to in (9.9). 


9.3.4.3 Ledford and Tawn Model 


The Ledford and Tawn model is an interesting alternative. For clarity, we describe it 
here in two dimensions even though it is defined for any number. 

Ledford and Tawn transform the data to obtain Fréchet marginals as in (9.2), and 
propose the following tail model on the transformed data (Z;, Z2): 
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P (Zi > zi, Z2 > 22) = Ž (zi, z2) (ziz), Yz; > 0, (9.10) 


where 7 is the tail dependence coefficient with O < n < 1, and Ža slowly varying 
bivariate function. The latter hypothesis—whose mathematical definition is not 
shown here—is quite general and unconstraining. It implies that the probability of 
simultaneous extreme threshold exceedances decreases in a regular way and faster 
than the corresponding marginal probabilities. 

Ledford and Tawn demonstrated the use of their model in several examples, and 
[382] has shown its quite broad applicability on an even greater number. However, 
[688] has provided several counter-examples, though these seem to be somewhat 
pathological cases [65]. 

Model (9.10) allows us to give the extreme value dependence coefficients x and x 
as functions of .Z and n. From the expression for the survival copula C of (X1, X2), 


C(u, u) = P (Zi > —1/logu, Z > —1/logu), 
= P(-1/logu, —1/logu)(—logu)'”", 


we can deduce that 


u— 1 


X = lim X(u) =2n—-1 and x = lim A(z, z)". (9.11) 
Z— 00 


Therefore, Ledford and Tawn can model: 


asymptotic dependence when n = 1 if lim, oo L (z, z) = c, withO < c < 1, and 
in this case, x = c measures the strength of this dependence; 

positively associated asymptotic independence when 1/2 < 7 < 1, or withn = 1 
and lim,_,.5 £(z, z) = 0; 

quasi-independence when n = 1/2, or even total independence if furthermore 
L(z,z)=1; 


negatively associated asymptotic independence when 0 < ņ < 1/2. 


Example 9.25 (Farlie-Gumbel-Morgenstern copula) 


Suppose that F has a Farlie-Gumbel-Morgenstern copula Cr: 
Cru, u2) = uju? (1 +a(1 — u1)(1 — u2)) 


with parameters —1 < a < 1. This copula is the independent one for a = 0. 
It can be shown that 


log (u? {1 + (1 — u)?}) 
logu 


xu)=2— Out 
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which implies that x (u) — x = Oasu — 1, as long as «œ Æ 0. Thus, all distri- 
butions F whose copulas belong to this family are asymptotically independent. 
For z — co, the formula 


a+1 LEL ae 


=i 
z> z 12z4 eo) 


PGi = oka = a) = 


allows us to obtain 7 and 7 as functions of a: 

e for a > —1, we have n = 1/2 and Y(z,z) = (œ + 1) — Ba + 1)z7! + 
o(z_!) for z — co: this is quasi-independence, 

e for a = —1, we have n = 1/3 and Z (z, z) = 2 — 477! + o(z7!) for z > 
oo: this corresponds to the negatively associated case. 


Example 9.26 (Gaussian bivariate copula) 


If F has a Gaussian bivariate copula Cp with correlation coefficient p < 1, 
then for z — co 


P(Z, >z,Z.>z)~ enio Ee ae Are 


where cp = (1 + p}*/2(1 — p)-!/2(4x 2/49), For any value of p, the distri- 
bution is asymptotically independent: x = 0, with x = p and n = (1 + p)/2. 
The positively associated case is modeled by p > 0, the negatively associated 
one by p < 0, and the total independence one by p = 0. 


Remark 9.13 These copulas are not those of limit distributions of multivariate max- 
ima since Cr(u,u) A u?40/2, 


9.4 Study Steps 
9.4.1 Overview 


A study of multivariate extreme values begins, like in the univariate case, by looking 
for evidence of a trend and/or seasonal effects in the data. After estimating these and 
removing their influence from the data, what remains corresponds to the standard 
stationary process setting, which is necessary for the validity of all of the results 
given earlier in the chapter. 

In a general sense, the non-stationarity of a process may be due to the presences 
of a break point (e.g., the date at which the process began to be measured, or some 
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change to the environment that significantly impacted the process), the presence of 
a trend (e.g., increase in air temperature due to climate change), or a seasonal effect. 

The study of non-stationary multivariate processes is extremely complex, and a 
lot of research has been undertaken in this direction. In the case of a simple seasonal 
effect, practitioners can simply constrain their analyses to a period of the year dur- 
ing which each marginal phenomenon is stationary. This strategy presupposes that 
the dependence structure between phenomena is itself stationary over the relevant 
period. This hypothesis seems reasonable and is satisfied if, for example, dependence 
between the given processes is constant over time. In this case, the techniques to use 
are those of the univariate analysis detailed in Chap. 6 of this book. 

Once a common stationary period has been found, it is essential, as we have seen 
before, to determine the type of dependence existing between the extreme values of 
different processes. Indeed, on this depends whether to use a model obtained from 
the limit distribution of the maxima, or an alternative one modeling the tail of the 
multivariate distribution—like the quite general Ledford and Tawn model. 

This step begins with plotting the data in the various spaces described in Sect. 9.2.2. 
This qualitative analysis can be supplemented with the quantitative one given in 
Sect.9.2.3, using statistical tests to accept or reject the asymptotic independence 
hypothesis. Calculation of the coefficients x and x can also help in this decision- 
making. 

Depending on the type of dependence and the goals of the study, a practitioner 
can then estimate an approximation of the tail of the distribution F or the probability 
of simultaneous extreme events directly. 

Modeling the limit distribution of the maxima can also be used to approximate 
the tail of the distribution with Eq. (9.9) in the total independence or extreme value 
dependence cases, and when the phenomena in question have no temporal correlation. 
This is, for example, the case in the wind speed example, where the daily maximums 
can be considered independent of each other. This is not, however, the case for many 
other phenomena, such as air temperature and river flows. In the latter cases, one can 
model the tail distribution of F using an alternative model that can take into account 
the range of types of extreme value dependence. 

Finally, there exist several estimators of the probability of simultaneous extreme 
values that can be applied directly to the data with passing by an estimation of the tail 
of F. This type of probability estimation comes with a confidence interval calculated 
using the asymptotic distribution of the estimator in question. This interval gives an 
idea of how precise the estimator is. If the probability of simultaneous extreme values 
is estimated over a grid of levels of extremeness, practitioners can then construct plots 
showing the probabilities of simultaneous extreme values, giving at a glance a visual 
idea of when the simultaneous probability exceeds certain limit probabilities chosen 
a priori. 

As in the univariate case, it is possible to define the annual frequency of (now 
simultaneous) extreme events in several ways, including (i) the mean annual number 
of days where the simultaneous events occur, (ii) the number of episodes of simul- 
taneous extreme events, and (iii) the probability that a simultaneous extreme event 
occurs at least once per year. Readers should examine Sect. 6.3.2 of this book for 
more precise details on the fundamental differences between these three definitions. 
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9.4.2 Inference Strategies 


Sklar’s theorem (Theorem 9.9) describes how a multivariate distribution is con- 
structed starting from the marginal ones F; and the dependence structure—the copula 
Cr. Thus, practitioners need to infer this function CF as well as the set of univariate 
F;. In the case of Model (9.9), obtained from the limit distribution of the maxima, 
inference is performed on the tails of the marginal distributions F;, as well as on the 
tail dependence function £. 

In Ledford and Tawn’s alternative model, determining the marginals F; allows us 
to transform the data as in (9.2) and then model dependence in this transformed data 
using the model (9.10). 

The question is then: in which order should inference on the marginal distributions 
and inference on the dependence structure occur? Several strategies are possible, 
which impact in different ways the overall estimation of the multivariate model. 


9.4.2.1 First Strategy 


This strategy involves first estimating the marginal distribution, then the dependence 
structure, in two separate steps. 


(i) Parametric inference is used on the marginal distributions because the marginals 
of the limit distribution of the maxima follow GEV distributions, and those of 
approximation (9.9) are written with the help of GPDs. 

(ii) Following this, the dependence structure can be estimated using the data trans- 
formed to follow a Fréchet distribution. We distinguish between two ways of 
transforming the data: 


e using the models established for the marginal distributions F;, or 
e using the empirical marginal cumulative distribution functions, defined by 


1 n 
Fe"? i) = =) lote (9.12) 
k=l 
for x; € IR, where n is the number of data and (X 5 ..., Xj) the n measured 


values of the i-th component of the phenomenon. 


This first strategy increases the risk of poorly evaluating the dependence structure 
since estimation of the joint model is performed on data transformed by a potentially 
badly fitted model. In contrast, the approach below totally decorrelates the estimation 
of the joint model from that of the marginal ones. It is therefore recommended [65, 
p. 318]. 
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9.4.2.2 Second Strategy (Recommended) 


This strategy consists of inferring the whole multivariate model in a single step, i.e., 
the marginal distributions and dependence structure are estimated at the same time. 
A complete parametric model can be used; this means jointly estimating the whole 
set of parameters defining the marginal models and the tail dependence function £. 

This strategy thus allows information transfer between the marginal distributions 
and the dependence function. It also has the advantage of allowing constraints to be 
placed between parameters from different marginal distributions, e.g., imposing the 
same coefficient £ on several marginal distributions F;. 

The downside of this strategy is possible complexity in the optimization problem 
it leads to, which only gets worse as the number of marginal distributions (and thus 
parameters) increases. One trick is to first perform a univariate analysis and use its 
output as the starting point for the global optimization problem. 


9.4.3 Inference 


Inference of the limit distribution of multivariate maxima is performed on block 
maxima data extracted from the original data. Maximum likelihood estimators are 
used here. As the extracted data is assumed to be independent, classical estimator 
consistency and convergence theory applies (cf. Sect. 4.2.2.3). 


9.4.3.1 Censored Maximum Likelihood 


By definition, models for the tails of distributions are valid in the tails, i.e., when 
all components are simultaneously extreme valued. It is, therefore, tempting to infer 
such models using data in which all components are extreme values. However, this 
naive approach considerably reduces the number of data that can be used, and makes 
model inference impossible in practice. 

For this reason, censored statistical likelihood is used, whereby information is 
gained from all of the data (see Example 4.9 for more details). The principle is 
simple: data for which certain components are not extreme values are interpreted as 
censored but not removed. Then, each data point contributes to the model likelihood 
in proportion to its number of extreme-valued components. The larger this number, 
the more it contributes. For instance, in the bivariate case, the likelihood L(x, x) 
of a data point (x1, x2), for a given threshold (u1, u2), is given by 
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F(u;,u2) if xı < u1, X2 < u2 
<—(%1,U2) if xy > uy, X2 < u2 
dx] 
L(x, oF | 9.13 
Pere —(ui,%2) if xy < uy, X2 > U2 F0) 
0X2 
a? F 


(x1, X2) if xy > u1, X2 > u2. 
0X1 0X2 


A list of parametric models that allow the tail dependence function £ to be esti- 
mated can be found in [65]. Nonparametric estimators of £ also exist, but most have 
difficulty respecting the constraints on the function. Constructing estimators that do 
so is an open problem, even in the bivariate case. 

Similarly, several parametric and nonparametric models for the function Ÿ from 
the Ledford and Tawn model can be found in the literature; these are defined starting 
from the spectral measure [243], an alternative way to represent distributions of 
multivariate extreme values. 


Definition 9.24 (Spectral measure) Let G be a multivariate extreme value distribu- 
tion. For all z € [0, oo], the exponential measure is defined by 


H(z) = — log G(z). (9.14) 


The spectral measure H is the (pseudo-)polar representation of ux, defined by 
Z 
H(B) = bs (z € [0, co] : [zl > 1, mik B) (9.15) 
Z 


for any Borelian B of the unit sphere in JR“. 


9.4.3.2 Example 


In this section, we illustrate inference of the model obtained from the limit distribution 
of the maxima for the conjunction of flooding of the Vienne and Loire Rivers (case 
1 in Sect. 9.1.3). Earlier in the chapter, we showed that these two phenomena had 
real dependence in their extreme values. 

The two inference strategies were put into practice, with little difference appea- 
ring between them. The dependence function £ was fitted to a logistic model defined 
by 


1 1\4 
L(vi, v2) = (vi 4 5) , Yw v) € (RtY, O<a <1. (9.16) 
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Table 9.1 Parameters of the marginal distributions of the bivariate distribution of extreme flow 
measurements (Vienne and Loire Rivers) 


Vienne Estimation St. dev. Loire Estimation St. dev. 
uy (m.s!) |1050 = u (m3.s~!) |2150 = 

oi (m?.s7!) | 335.29 35.75 o (m?.s7!) |571.14 49.50 
& —0.126 0.08 & —0.259 0.05 

M 0.012 - ho 0.0119 = 


Table 9.2 Parameter of the logistic model for dependence in the bivariate distribution of extreme 
flow measurements (Vienne and Loire Rivers) 


Dependence Estimation St. dev. 
a 0.879 0.02 


The parameter œ measures the strength of dependence between the extreme flows 
of the two rivers. In particular, total dependence corresponds to œ —> Ot and total 
independence to a = 1. 

As for F, it is given by, for each (x1, x2) > (u1, u2), 


F(x, x2) = exp |- [C n AG + me GT |, 
where the tails of the marginal distributions F; are modeled with the help of GPDs: 


X — Uj 


1 
CE 
Aaya 1-4 Î1+E | , Vxi > ui, 


i 


with Ài = P(X; > ui). 

Tables 9.1 and 9.2 summarize the parameter estimation for the whole model. The 
standard deviations of the estimators can be used to obtain confidence intervals con- 
structed with the help of standard normal distributions. Figure9.11 plots Pickands’ 
dependence function A(t) of the model selected and several iso-quantiles defined by 
F (x1, x2) = constant. 


9.4.3.3 Inferring the Tail Dependence Coefficient 


The previous sections have shown the importance of Ledford and Tawn’s tail depen- 
dence coefficient 7, due to it being able to help distinguish between different types 
of convergence. 

We now have a look at some of the many estimators of 7 available. Several others 
can be found in [65], along with their asymptotic properties: those of Peng [605], 
Hill [389], Draisma [226], and Beirlant and Vandewalle [68]. 
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Fig. 9.11 Bivariate flow distribution: plot of Pickands’ function A(t) (top-left), spectral density 
(top-right), and iso-quantile lines (bottom). Bottom: the x—axis indicates the discharge values 
(m? /s) of river Vienne, while the y—axis indicates the corresponding discharge values for river 
Loire 
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Ledford and Tawn’s estimator is based on the following result: the tail of the 
distribution T, a scalar random variable defined by 


T = min(Zj, Z2), 


where the variables (Z1, Z2) are the data transformed according to (9.2), follows 
a GPD with extremal index € = yn. Thus, we can estimate 7 by maximizing the 
censored likelihood of a GPD model over a sample of data from T. 

However, construction of a sample from T beginning with the observed data 
relies on the modeling of the marginal distribution functions F;; the latter can be 
approximated by their empirical cumulative distribution functions—thus giving a 
sample denoted Temp, or with the help of the Pareto model estimated from the data. 
The latter approach leads to a sample which we call T,,;. By mixing together various 
estimators of 7 based on samples from T, and the multitude of ways of forming this 
sample, we can construct a whole range of estimators of 7. 

Here are four examples: 


bd 


Hill’s estimator ne, based on Temps 

2. the estimator me, obtained by maximizing the likelihood of the GPD model based 
on Temp; 

3. Hill’s estimator np» based on Thor; 

4. the estimator Nm p» obtained by maximizing the likelihood of the GPD model based 

ON Tor: 


Figure 9.12 for case 4 (cf. Sect. 9.1.3) shows the behavior of these four estimators 
as a function of the thresholding parameter p,. This parameter is set at the largest 
value for which estimation of 7 is stable and all of the estimators converge to the 
same value. We retain p, = 0.96 and 7 = 0.51. This value is very close to 0.5, which 
characterizes independence, factually true in this case. 


9.4.3.4 Estimating the Probability of Simultaneous Extreme Values 


This section studies several examples of estimators of the probability of simultaneous 
extreme values, defined as exceedances of extreme thresholds. These estimators, 
constructed differently, have different statistical properties. Practitioners need to 
compare and contrast them in order to decide how to proceed. 

We proceed based on the method of Draisma et al. [226], which involves Ledford 
and Tawn’s model (9.10). These estimators are shown here in two dimensions but 
can be generalized to an arbitrary number. 

Let us consider the probability p = P(X; > x1, X2 > x2), where xı and x2 are 
marginal extreme values. Denoting py = P(X; > xı) and p = P (X2 > x), it is 
possible to show that Ledford and Tawn’s model corresponds to approximating p by 


1/n 
C 
pe (<) P (F(x) < p”, P(X) <q), (9.17) 
qo c c 
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Fig. 9.12 Case 4: estimating | 
n as a function of the 
parameter py - 
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where go is a probability for which go — 0, and 


c= Dit pi. 


The estimators detailed here are all derived from (9.17) and involve various esti- 
mates of the quantity 


T 2 
p= P(A) < n”, Fy(X2) < a) ; 


alongside the tail coefficient 7, estimated separately. 

Estimator p, uses the empirical estimator of the mean to approximate p and 
model the marginal distributions F; and F using their empirical estimators. If we 
take go such that ngo — co, then since 0 < p1/co, p2/co < 1, a certain number of 
observations X i (resp. X i) have a rank above 1 — go (resp. above 1 — go ). The 
use of the empirical cumulative distribution function is thus possible. 

Estimator p, uses the empirical estimator of the mean to approximate p, and 
models the approximate marginal distributions F and F with the help of Pareto 
models. 

These first two estimators were studied in [226, 369], and their asymptotic distri- 
butions provided. 

Estimator pr (from [133]): setr = p2/p1 and let T, be the variable defined by 


1 
T, = min ( ? 4 ). 
1= Fi(Xı) 1 = P(X2) 
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Fig. 9.13 Probability of simultaneous exceedances of extreme thresholds as a function of a param- 
eter po: estimators pe, Pp, Pr, and Pr 


Then, [133] showed, under the Ledford and Tawn model’s hypotheses, that p can be 
estimated consistently by 


k Wh 
pr = = (PiTrn-tm) a (9.18) 


Estimator p, takes temporal correlation in T, into account. Under certain con- 
ditions on the correlation (which has to be short term), the limit distribution of the 
maxima of T, is still a GEV distribution with parameters (£, 0, u). Let T. be the 
independent process with the same marginal distribution as T, in the domain of 
attraction of G, a GEV with parameters Ë, ©, ü). Then it is possible to link (£, o, u) 
and (é , ©, ü) [65, p. 377] with the help of the extremal index 6. Details of this can 
be found in Chap. 6 (Sect. sec:Fund). 

We can, therefore, deduce that the parts of T, exceeding the threshold & follow a 
GPD distribution with parameters (£, &). The estimator p, is then defined by 


1 M re Ë 
p= Lir, >i 1 = : 9.19 
p GÈ (a )( ne ) (9.19) 


Figure 9.13 shows the graphs obtained when estimating the probability of simul- 
taneous exceedances of extreme thresholds for case 4, as a function of a parameter 
Po. This parameter is set at the largest value for which the probability estimates are 
stable and converge. 
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Fig. 9.14 Probability density of Vienne River flows when the Loire River is undergoing a 100-year 
flood 


Table 9.3 Probabilistic characteristics of the flow distribution of the Vienne River in conjunction 
with a 100-year flood of the Loire River 


Probabilistic quantities Vienne flow rates (m?.s—!) 
Mean 2513 

St. dev. 141 

Median 2517 

Quantile of order 25% 2436 

Quantile of order 75% 2629 


9.4.4 Using the Assessed Models 


We now describe several of the many ways to make use of results coming from a 
multivariate analysis. 

When the analysis finds an expression for the tail of the multivariate distribution 
F of the phenomena, it is then simple to model various extreme-valued conditional 
scenarios. This is done by setting one of the phenomena to a particular scenario; the 
goal is then to regard the impact of this on the other phenomena. 

For example, in Case 1 with the conjunction of two rivers flooding, we can look 
at the distribution of flows for the Vienne River when the Loire is in a 100-year flood 
situation. Figure 9.14 plots the probability density and the cumulative distribution 
function of the flow distribution in the latter setting. Table9.3 lists several related 
probabilistic features. 

The expression for F can also be used to help calculate iso-quantile surfaces, 
defined for levels 0 < q < 1 by 
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Fig. 9.15 Annual iso-frequency contours (log scale) with levels 10 9 10-8, 1077, and 1074, in 
the return period space for flow and wind (Np, Ny): simultaneous extreme values with the same 
annual frequency 


Sı = [x ER | Fœ =q}. 


If we associate the notion of quantile with that of criticality (or extremeness), then 
these surfaces correspond to the set of iso-risk combinations. For example, Fig. 9.11 
shows the iso-quantile lines of order q = 0.99, 0.995, and 0.999 of the bivariate flow 
distribution. 

Estimators for the probability of simultaneous exceedances of extreme-valued 
thresholds can be used to plot probability maps. For instance, for Case 4, Fig. 9.15 
shows annual iso-frequency contours, defined in terms of the mean number of days 
per year where simultaneous storms and floods occurred. The level of extremeness 
of each marginal phenomenon is given by its return level. Chapter 17 also shows a 
probability map in a case of positively associated asymptotic independence. 

Finally, let us mention an original use of the function u —> x (u) in the asymp- 
totic independence case: bounding x (u) for u > uş makes it possible to bound the 
probability of simultaneous extreme events occurring. Though less precise that an 
actual estimation of the probability, an upper bound on the latter probability may 
sometimes be sufficient in showing that the simultaneous extreme events in question 
are so unlikely to occur together that they are not worth studying in more detail. 
This type of reasoning assumes that the practitioner has provided extremely low 
probabilities of occurrence below which they will not study these events in more 
detail. 

For instance, in Case 3, we found that the low-temperature and river baseflow 
phenomena had positively associated asymptotically independence. The hypothesis 
of their independence would, therefore, lead to an underestimate of the probability 
of their simultaneous extreme-valued occurrence. 
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A study of the function u —> x (u) in Fig. 9.8, estimated on the opposite values 
of the phenomena (do not forget that the theory provides tools for studying extremely 
high values!), shows that for any u > u,, X(u) < €;, where us = 0.9 and €; = 0.3. 
Note that the end of the curve should not be taken into consideration because esti- 
mation of x takes place via the empirical estimator of C(u, u), which is imprecise 
when u tends to 1 for lack of data. 

A little algebra allows us to link the quantile q of order u of (— X) to the quantile of 
order (1 — u) of X: P(—X < q) = u, which is equivalent to P(X < —q) = 1 — u. 
We can then return to the initial flow and temperature processes. Thus, an upper bound 
on X (u) implies that for any extremely low order v such that v < 1 — us = 0.1, we 
have 


2 
P(Q <q, T <6) 205 20, 


where q, and t, are the order v quantiles of the flow variable Q and air temperature 
T. 


9.5 Conclusion 


Extreme value multivariate analyses, like univariate ones, require a certain number of 
choices on the part of the analyst. There is not yet a completely automatic and general 
approach available, and one often has to take details of the actual environmental 
processes of interest into account when decision-making. 

In addition, certain multivariate problems remain little-studied in the literature. 
Let us now mention a few such difficulties that we are often faced with. 

Temporal correlation between the observed values of each natural process (e.g., 
temperature, flow, etc.) complicates estimation of the joint distribution of multiva- 
riate processes. For example, there are still very few simple rules for the statistical 
estimation of the extremal index coefficient in the multivariate setting. More gen- 
erally, the processes we study are often non-stationary, e.g., a trend in temperature 
measurements due to global warming. In such cases, it becomes impossible to apply 
the results presented in the chapter. 

The threshold exceedance approach—though well-developed in the univariate 
setting—does not easily follow through to the multivariate one. An extension of the 
unifying framework of colored point processes to the multivariate setting, initiated 
by [546], is an area of active research. Indeed, in the multivariate case, it is difficult 
to define what clusters are, i.e., sequences of successive exceedances of a threshold 
that can be seen as members of the same exceedance events. Little work has been 
attempted on this, though we can cite two multivariate declustering methods: one 
from Coles and Tawn [178] and the other from Nadarajah [542]. However, these 
methods do not work with all types of dependence and require careful choices for 
numerous parameters. 
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The joint distribution estimated by extreme value theory is only valid in the zone 
where all of the components are simultaneously extreme valued. However, zones 
where only certain components are extreme valued are also interesting. In such 
cases, the results we have presented here cannot be used, and new techniques need 
to be developed. 

Finally, note that the notion of a multivariate return period is more complicated 
to define than in the univariate setting. One way it can be defined is as the mean 
waiting time between two successive occurrences of the process in a domain chosen 
by the practitioner. This domain can be that which corresponds to all (or some) of 
the processes’ components exceeding some extreme-valued threshold. 

Kendall has suggested an interesting and original definition of this domain: it is 
defined by events of the process corresponding to quantiles above some given level. 
This, therefore, links the notion of quantile with that of criticality or extremeness 
[328, 329, 334]. On the iso-quantile surface, it, therefore, corresponds to choos- 
ing a particular combination that a practitioner will retain when dimensioning their 
structure. This choice may be based on probabilistic (the most likely combination) 
or project-specific (the combination which would have the largest impact on the 
structure in question) considerations. 


Chapter 10 A) 
Stochastic and Physics-Based Simulation | as 
of Extreme Situations 


Sylvie Parey, Thi-Thu-Huong Hoang, and Nicolas Bousquet 


Abstract This chapter addresses two alternative approaches to extreme situations, 
which may be useful when the lack of extreme observations severely limits the rele- 
vance of a purely statistical approach. The first methodology is based on stochastic 
modeling of the regular phenomenon, for example, via autoregressive processes. 
Capturing this phenomenon allows extrapolation to extreme values based on theo- 
retical properties of stochastic processes. The second approach is based on the use, 
by Monte Carlo-based methods, of numerical simulation models implementing the 
physical equations representing the phenomenon under study. This second approach, 
which is originally used in structural reliability, requires the development of specific 
simulation techniques that focus on very low-probability events; it appears to be 
more and more appropriate as expert knowledge of the phenomena increases. 


10.1 Introduction 


The preceding chapters have shown that it is not straightforward to directly apply 
the statistical theory of extreme values to natural hazards. On the one hand, variables 
are not necessarily stationary nor independent, and on the other, there is sometimes 
simply not enough data available for the theory to be applicable or the results reliable. 
Indeed, the theory is based on asymptotic convergence toward known extreme value 
distributions. The actual amount of available data is always limited, but it is impor- 
tant that the number of values over which a maximum is extracted be sufficiently 
large—or the threshold in the POT approach sufficiently high—for convergence to 
be considered attained. By way of example, when we wish to estimate the return 
level of a frost index (defined as the sum of the absolute values of temperatures 
below 0°C [754]), there are in general very few values per year to work with. In this 
specific case, it is preferable to use a daily mean temperature model to simulate a 
larger sample of frost indexes. It is therefore essential to have stochastic models that 
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correctly simulate the entire distribution, including its extreme values. This approach 
is considered in the first part of this chapter (Sect. 10.2). 

When dangerous events are extreme-valued examples of physical phenomena that 
can be described by a series of physics equations—giving a mechanistic model of the 
phenomenon, and when this model can be implemented on a computer, the numerical 
model (i.e., code) thus created can also be used to simulate virtual observations of 
the hazard and calculate occurrence probabilities and return levels. This alternative 
approach to the use of extreme value statistical theory is deeply associated with the 
field of structural reliability [475], which studies the response of complex physical 
systems (e.g., buildings) to probabilistic stimuli (e.g., strong winds) that are rarely 
or never seen. 

For instance, the flow dynamics of a river are well described by free surface fluid 
dynamics equations. A set of solvers and tools dedicated to solving problems arising 
from these are found in the TELEMAC-Mascaret! software, initially developed by 
EDF/LNHE and managed since 2010 by a consortium of six international research 
centers (including EDF R&D). This software permits simulation of downstream river 
flows, taking as input parameters a set of upstream flows, information on the river’s 
geometry, and its friction parameters. Depending on the level of detail sought at the 
output, and knowledge available on the inputs, adding probability distributions to the 
latter makes it possible to simulate a distribution of flows, followed by water levels, 
and conduct a Monte Carlo-based analysis. 

Two important problems associated with Monte Carlo-based approaches are dis- 
cussed in the second part of the chapter (Sect. 10.3): difficulties due to the calculation 
time required to explore the input parameter space before reaching areas characteriz- 
ing extreme-valued outputs (often cripplingly long), and the validity of these models 
for such extreme situations of interest. This second point is crucial because such 
situations are sometimes characterized—in the case of natural hazards—by a sudden 
change in the physics. For instance, a flood can be defined as a dike overflow, which 
can highly modify the flow by adding new physical processes to the scenario. The 
domain of validity of an implemented numerical model may be strongly affected 
by the influence of these new physics and the presence of so-called edge effects 
such as local losses of continuity or differentiability. The uncertainty surrounding 
the extrapolation of a numerical model beyond its domain of validity is therefore 
likely to severely limit its use in practice. 


10.2 Stochastic Models 


The scientific literature is rich with works on the stochastic modeling of environ- 
mental variables, especially meteorological ones. These models are generally char- 
acterized by their faithful reproduction of the average, median, quartiles, etc., of the 


' See www.opentelemac.org. 
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observed statistical distribution of a variable. However, they have more difficulty in 
correctly reproducing variability and, in particular, extreme values of phenomena 
[296]. 


10.2.1 A First Example: Air Temperature 


The environmental variable most associated with stochastic modeling is without 
doubt the air temperature at a given altitude. The general principle underlying stochas- 
tic models of the daily temperature (whether it be the mean, maximum, or minimum), 
denoted X (t), is the combination of a deterministic part {A (t), ®(t)} and a stochas- 
tic process Y(t) for relevant random fluctuations—aiming to encapsulate natural 
variation in the phenomenon (see Sect. 3.1.2): 


X(t) = A(t) + O()Y(). 


e The component A(t) contains at least one seasonal cycle S(t) [136], such that 
(typically): 


S +T) = S(@®), 
and fairly often a trend T (t) modeled by linear regression [136]: 
T(t) = at. (10.1) 
e The stochastic part is generally given by a stationary autoregressive (AR) function 
of order p of varying levels of sophistication. Typically this is in the form of an 
AR(1) (p = 1) up to AR(3) (p = 3) model: 


Y(t) = ao+mY(t—-1)+aY(t—-2)+...+a,Ÿ(—p)+8, 


where (¢;); may be either: 


(a) a white noise, i.e., a sequence of centered uncorrelated variables with constant 
variance: 


Efe] =0, WVleJ=o7, Covfe,e;] =0 fort Æt, 


(b) or a GARCH(p, q) process (generalized autoregressive conditional heteros- 
kedasticity), a standard choice in time series econometrics [100], defined by 


ElelZ-11 = 0, 


q P 
VlelF i] = 07 = o+ D> Bie? +Y yog, 
i=l j=l 
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where F;_, is the filtration (or o-algebra) representing the past of the process 
(€;):. This simply is called an ARCH process when p = 0 [252]. 


For some models, the variance can be seasonal; a yearly cycle seen in the variance 
can be modeled using seasonal coefficients [649]. 


However, the behavior of extreme values (their occurrence, heatwaves, cold snaps, 
etc.) cannot be well reproduced by analyzing the large deviations of these models, 
in particular, due to poor estimation of the conditional variance V[e,|.F;_1] [296]. 
In reality, all studies based on the temperatures of the last 60 years indicate that the 
temperature distribution is bounded, meaning a negative shape parameter [427]— 
this independent of the method used to study the extreme values: GEV, POT, Hill’s 
estimator, on either the original or normalized data. For this reason, linear Gaussian 
models or GARCH processes with unbounded tails generally model the extreme 
values poorly. 


10.2.2 Second Example: Rainfall 


Precipitation is a more complex variable, with high variability (both temporally 
and spatially [187, 740]), but is in particular characterized by numerous zeros: days 
when it does not rain. This random variable is therefore both continuous and discrete. 
Because of this, most rainfall generators involve two separate simulators: 


1. A rainfall simulator, generally constructed as a discrete Markov process (cf. 
Definition 4.2), whose states (classes) may represent: 


(a) arain or not rain event [513, 786] (two classes); 
(b) adry day, light rain, heavy rain [743] (three classes); 
(c) hidden states of the regional climate [24]. 


2. Asecond simulator for the amount of precipitation, conditional on the occurrence 
of a rain event, founded on one of the following distributions: 


(a) a Gaussian distribution truncated to be always positive, and taken to a certain 
power to recreate the heavy tails of precipitation distributions [24]; 

(b) a standard exponential distribution, gamma distribution, or a mixture of 
exponentials. The latter is used notably by the DTG (General Technical 
Division), EDF’s team of hydraulics experts. 


In parallel, episode models can also be used. These no longer model pointwise precip- 
itation at given time steps, instead modeling rainfall episodes spread out over several 
[377, 486, 662]. These models have the advantage of reproducing the main features 
of precipitation observed over different time steps: intra-day, daily, and multi-day. 
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10.2.3 Third Example: Wind Speed 


Numerous stochastic models make it possible to represent wind speed—sometimes 
coupled with other climatic and spatial variables to form generators [1 10]—but these 
models remain questionable with respect to the behavior of extreme values. In reality, 
wind is a naturally unstable variable; it is intermittent and its properties (including 
how its extreme values behave) can be different depending on the time step used. 

For the wind speed variable, scientists sometimes admit that extreme wind speed 
values are Gumbel distributed. More generally, the Weibull distribution is accepted 
as correctly representing wind speed distributions, and its extreme values belong to 
the Gumbel’s domain of attraction [31, 83, 198]. It does not seem, moreover, that 
there is an observed upper limit for wind speed, which is also consistent with a 
Gumbel distribution. 

However, some reject the hypothesis of “Gumbel” behavior for extreme wind 
speeds. The authors of [424, 772] show that the wind speed is better represented using 
a bounded distribution. According to [757], “intuitively and physically”, the wind 
speed should have an upper limit and thus be bounded. Many other distributions have 
been proposed to fit wind speed distributions—which are non-negative and right- 
asymmetric. Examples include the Weibull, gamma, Rayleigh, truncated normal, 
log-normal, square root normal, Skew-f, and the mixture of a Weibull and a normal 
distribution. For more details, see [142]. The Markovian models proposed by [269], 
which adopt a POT approach for the behavior of the extreme values, propose for their 
part an explicit representation of the temporal dependence, and are in this sense close 
to the non-stationary statistical models proposed in Chap. 8, and the multivariate ones 
in Chap. 9. 


10.2.4 Fourth Example: Solar Radiation 


The most popular stochastic models for daily, monthly, or annual solar radiation accu- 
mulation are based on linear or quadratic regression linking components of radiation 
(direct, diffuse, albedo*), duration, cloudiness, temperature, and other meteorologi- 
cal variables [28, 571]. Markov chain approaches have also been used, such as the 
one in [747], which models radiation using a continuous-time Markov chain. 

Solar radiation is, in essence, a bounded variable: with a clear sky and no clouds, 
the earth receives direct solar radiation. It is therefore possible to estimate this upper 
bound by dividing the maximum observed irradiance by the clear-sky index. The 
latter is the ratio between the total irradiance observed on Earth and the (theoretical) 
irradiance that would be observed with clear skies [93, 535, 631]. Nevertheless, it 
would seem that no up-to-date study has specifically discussed the tail behavior of 
solar radiation. 
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10.2.4.1 Discretizing Continuous Stochastic Processes 


Note that environmental variables are continuous over time. Though being measured 
at discrete time steps, it would thus appear appropriate to model their evolution using 
continuous stochastic processes such as diffusion processes—of which Brownian 
motion and Wiener processes are the most well-known. 


Definition 10.1 (Diffusion process) A diffusion process { X;}* is a stochastic process 
which is the solution to a stochastic differential equation of the form 


dX, ” 
T 5 MX) + Dg XDER), (10.2) 
k=1 


where X, is a differentiable random process, .# a term representing the determin- 
istic part of the evolution of X,, and {g,}; a set of functions coupling the process’s 
evolution with a vector of white noise {& }; (usually Gaussian). The solution {X;}*, 
when it exists, is a continuous-time Markov process such that there is almost surely 
a sample path from each state of X, to each other state. 


This stochastic differential equation represents the noisy dynamics of a physi- 
cal phenomenon. Though (10.2) does not have only one interpretation and can lead 
to several types of solution (It6 or Stratonovich formulation), it is unambiguously 
linked to the Fokker—Planck equation, which characterizes the temporal evolution 
of the probability density of the speed of a particle under the influence of drag and 
random forces. The latter can in turn be linked with the Langevin equation, which 
approximates the behavior of a system of particles. The choice of a solution process 
for modeling certain macroscopic variables thus implies, in theory, a change of scale 
and a relaxation of constraints in the description of underlying physical phenom- 
ena. This is however supported by the observed behavior of certain environmental 
variables (e.g., seismic activity [504]), and the possibility to perform approximation 
(of the Gauss—Galerkin type) under boundary conditions on the phenomena using 
sequences of discretized measures [248]. 

The discretization of a continuous process comes, however, with dangers since it is 
generally not guaranteed that theoretical results applying to the continuous setting— 
such as the tail behavior of distributions—are transferable to the world of discrete 
processes. By way of example, if only wind speed measurements at discrete time 
steps are taken into account, certain extreme events may not be recorded. In such 
cases, the characteristics of the extreme values may change depending on the time 
step chosen. Tail behavior, however, is preserved in the case of diffusion processes, 
which makes them of particular interest: the tail behavior of iid discrete variables 
derived from the marginal distribution of the process remains the same as that of the 
continuous-time diffusion process. 

This justifies the pertinence of an approach in which a discretized version of a 
diffusion process is used as a simulation model for the discrete variables, and extreme 
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value theory specific to diffusion processes [85, 200] is used to correctly reproduce 
discrete extreme values. 

The use of stationary diffusion processes defined with Gaussian noise has proved 
particularly useful for modeling the (residual) air temperature. The following section 
describes this further. These same tools can be applied to radiation, combined with 
the use of a hidden Markov model [747]. For other variables such as wind and 
rain, physical constraints pose numerous problems limiting the relevance of using 
diffusion processes. Wind is unstable and has an asymmetric distribution, requiring 
a step transforming (e.g., log or logit) the discretized data to get back to the Gaussian 
setting. As for rainfall, the presence of a huge mass at the value zero makes this 
variable less continuous, but we could consider using a diffusion model alongside a 
Markov chain, as in [747]. 


10.2.5 Stochastic Modeling Using Bounded Diffusion 
Processes 


This section summarizes an example of stochastic modeling applied to air temper- 
atures, as detailed in [191]. The approach is quite general: first, remove the deter- 
ministic parts (trends, seasonality) to make the variable more stationary, then model 
what remains as a discretized diffusion process. 


10.2.5.1 Stationarity 


The direct use of a stochastic model (e.g., a diffusion one) whose coefficients vary as 
a function of time is not feasible. First, we must deal with the presence of a trend and 
seasonality in temperature time series. Indeed, such time series with time steps of 
one day exhibit clear seasonal cycles and trends (Fig. 10.1). The idea is to make the 
given temperature time series {X (t)}, stationary by removing trends and seasonality 
estimated based on the following model: Vr € [1, T] C IN, 


Xı =m, + Ui + StvtY;, (10.3) 


where {u,, v7} and {m,, s?}, respectively, correspond to seasonality and trends present 
in the mean and variance. {Y;}, is called a residual time series. Model (10.3) is 
identifiable if the following conditions are satisfied [191]: 


T 


fi 
Smit) = 0, yao = 1. (10.4) 
t=1 


t=1 


Trends are usually modeled using linear functions as in (10.1). However, this is 
not of use here because the trend depends on the considered period. It is preferable to 
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Fig. 10.1 Comparisons between the daily means (top) and variances (bottom) of observed data and 
simulations (with a 95% confidence interval) 


use local regression modeling of the LOESS type [169], which is more flexible. As 
in most nonparametric approaches, LOESS requires an optimal estimation window 
(in years) to be determined, inside of which groups of data are considered homo- 
geneous. However, the temperature is a strongly autocorrelated variable, meaning 
that standard selection criteria such as cross-validation are difficult to apply usefully. 
A new criterion, proposed in [391], can help determine the optimal window size. 
In most temperature studies, it turns out that for estimating low-frequency trends, a 
window of 10-15 years is generally used. 

More generally, estimation of the other parameters in Model (10.3) proceeds as 
follows (technical details can be found in [191, 391]). 
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(a) Estimating seasonality u,. Additive seasonal effects u can be adequately rep- 
resented in terms of trigonometric polynomials: 


20 tk 


27 
Pp(t) = Yo + > Vi cos —— ine 


T3 Wk 2s (10.5) 


365 


and estimated using the series X,. The degree p is chosen using AIC [25] or small 
sample corrected versions of it (see, for example, [400]). This approach leads 
to quite similar results to those obtained using nonparametric approaches of the 
STL type (seasonal-trend decomposition based on LOESS smoothing [167]). 


(b) Estimating the trend m,. Once u, has been estimated by %,, consider the model 


wm 


Xl =m, +7, (10.6) 


where X : = X, — ü,. The series Ý, is not supposed stationary. Estimation of m, 
is performed using LOESS regression, with the help of the following theorem 
proved in [391], extending a result from [675]. This theorem results in quadratic 
control of the difference between the estimator and the unknown function m, 
under a condition on the covariance of the Y;. The use of a LOESS estimator 
(see Sect. 4.3.2) implies the choice of a kernel function K with compact support 
and smoothing (bandwidth) parameter hr for which hr — 0 and Thr — œ as 
T > co. 


Theorem 10.1 Suppose that m, is a twice-differentiable function for t € IN (i.e., 
€ C?), that sup E[¥7] < œ, and that Y, satisfies the following condition: 
[LT] 


sup Cr(Ÿ,) = C(Ÿ;) < 00, (10.7) 
LT] 


where for any square-integrable stochastic process W, 


T ay 
Cr(W) = 2 XC | Cov(W;, W;) |. (10.8) 
j=l 


A 


Let m, be the LOWESS estimator for m,. Then 


| 1 $ 1 
EG, — m,) = (503 muh) + x —MoCy(T) +0 (Ge + Ws) 
T 


(10.9) 


1 
Thr 
where 1°; is the j-th order moment of KO, 


For a stationary process, Condition (10.8) implies weak dependence (L°), while 
for a non-stationary one, it controls the level of non-stationarity, which means that 
certain properties of iid series can be applied to dependent and non-stationary data. 
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Using (10.8), we can get asymptotic results which are close to those for ordinary 
regression. 


(c) Estimation of the seasonality v7. Like s,, multiplicative seasonal effects v? can 
be estimated using trigonometric polynomials using the series 


(X: m ity = rit) = a 


(d) Estimating the trend s? using the series HEAR = = À). Consider the 
model: 


= 52 +s}? — 1) 
= s? + R,. 


We suppose that the stochastic process R, satisfies Condition (10.8) (which is 
equivalent to a condition on the 4th-order cumulants of Y,). Here again a control 
theorem proved in [391] allows a LOESS estimator 3,7 for Cree 


Theorem 10.2 Let kr be the bandwidth parameter for the kernel K used to estimate 
s. Suppose that kt + (Tkr)! = o(h2). Then, 


k2 
EG x) —s?(x)) = 7 D") + olki ), (10.10) 


1 
VE?(x)) < CO +o (TE (10.11) 
Finally the estimated reduced series is defined by, Yt € [1, T], 


ei A (10.12) 


Ve St 


Let us terminate this section with several important remarks. 


1. The rate of convergence of the estimator of s? is independent of m,. An optimal 
choice for hr based on Theorem 10.1 is impossible to obtain since the quantities 
m? ) and C y(t) are unknown. Note also that standard cross-validation (CV) 
criteria cannot be applied to correlated and non-stationary data, and do not help 
select the bandwidths hr and kr. In [391], itis suggested to perform this selection 
using a modified partitioned cross-validation (MPCV) algorithm, a variant of 
the PCV approach proposed by [503]. Simulation studies in [391] have shown 
the improved behavior of MPCV over PCV, as well as over other bandwidth 
calibration algorithms in non-stationary settings, such as MCV [163], plug-in 
CV [289], and block bootstrap [370]. The published study reused here chose to 
set kr = hr è 
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2. A block bootstrap method [370] can be used to calculate confidence intervals. For 
the daily temperature, 10-day blocks suffice for taking into account correlation 
between temperatures on consecutive days. 


10.2.5.2 Simulation Model for the Reduced Series and Taking Extreme 
Values into Account 


The reduced series {Y;}, for which E[Y,] = 0 and ELY?] = 1 (centered and nor- 
malized) in (10.3) is potentially stationary. In reality, there may still remain some 
seasonality in the correlation and in its 3rd and 4th-order moments (this is generally 
the case for temperature data). Consequently, the residual series is not stationary 
but cyclo-stationary, i.e., its distribution does not change over certain time periods. 
Specific statistical tests are provided in [391]. 

We thus suppose that Y, is a stationary diffusion process satisfying a stochastic 
differential equation of the type 


dY, = b(Y,)dt + a(Y,)dW,, (10.13) 


where b is the drift term in the model, a > 0 a diffusion coefficient, and W, the 
Brownian motion. The terms a and b depend on the time ¢ (day of the year), which 
helps take seasonality remaining in the residual series into account. According to 
(10.13), the marginal density of the process Y, is given by 


o) 1 f 7 > b(v) d 
V(y) = =~ ex = dy |. 
OA TO 

The use of the theory of extreme values of bounded diffusion processes, studied by 
[85] and reformulated by [200], requires that the diffusion process (10.13) satisfies 
certain boundedness properties, which are not constraining in practice since the 
original process X, represents a physical phenomenon with a limited amplitude 
(even if its maximum is not necessarily known a priori). By necessity, the Brownian 
motion W, is supposed truncated. We thus denote [r1, r2] the support of Y, and make 
the following hypotheses (validated in [391] for the temperature): 


(i) a and b are defined and continuous on [r1, r2]; 
Gi) bibir) #0. 


Theorem 10.3, taken from [191], shows that under these hypotheses, if there 
exists a limit distribution for the standardized maxima Mr = {max Y,,0 < t < T}, 
then one also exists for an iid sample from the distribution (10.14), and these two 
limit distributions are GEVs with the same shape parameter &. This result is valid 
for bounded processes where £ < 0, but an extension to cases where & > 0 is also 
possible. 


Theorem 10.3 (Maxima of a diffusion process) 
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Under Hypotheses (i) and (ii): 


1. If the distribution of the maximum of the diffusion process Y, is in the domain of 
attraction of a GEV distribution with E < 0, then the marginal distribution v of 
the diffusion process belongs to the same max-domain of attraction. 

2. Furthermore, 


ay) = —2b(2)8"(r2 — y) + 0(r2 — x) when y > n, 


where 1/& + 1/&’ = 1. 


To fit the model to the dynamics and other intrinsic characteristics of the resid- 
ual temperature, a discrete version of the diffusion needs to be used [191]. The 
continuous-time residual temperature Y, is thus represented by a seasonal and 
heteroscedastic* autoregressive model 


Y, = ci) + ai) (Yi), (10.14) 


with 


e c(Y%-1) = Pp(t)Y (t — 1), where P,(t) is a p-th order trigonometric function with 
a period of A = 365 days, 
e n, an iid truncated Gaussian noise. 


Seasonality and the existence of upper and low bound constraints requires the 
conditional variance a?(Y,_;) to be defined as follows: 


5 


a? (Y1) = 2- DE- A) D PpO ZE, 
k=0 


da? 2(c(r1) — rı) 
— (rn) = ———., 
dy 1—1/§ 
da o j= 2(c(r2) — r2) 
dy © 1-1/& 


Finally, consistent statistical estimation of the model (10.14) can be obtained in 
the following way. For technical details, refer to [191]. 


(a) Estimating the autoregressive part of Y,. The number p of cos and sin terms 
in the trigonometric function P,(t) is chosen using AIC (see Sect. 8.1.2.3), and 
the parameters of P, estimated using a least squares method (cf. Eq. 10.17) on 
the data Ÿ, produced by (10.12). 

(b) Choosing the number of dimensions for P,,. The number of cos and sin terms 
p' in Py is also chosen using AIC, based on the model without boundedness 
constraints: 
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k a 2 
D Pre OVE, = Mr — êY). 
k=0 


(c) Estimating the bounds r; and r2 by applying a GEV model (u, ø, £) to the 
maxima of Y, and a GEV model (y, o’, &’) to the minima of Y,: 


(d) Estimating the parameters of P, on the interval [7,72] using the series 
(Y, — €(Y;_1))? and maximum likelihood with boundedness constraints. 


10.2.6 Case Study 1: Extreme Air Temperatures 


The modeling methodology described in the previous paragraphs was applied to 
temperature times series taken at three locations: Berlin (German), Fruholmen (Nor- 
way), and Death Valley (USA). Here, we illustrate putting simulation and validation 
procedures into practice to test model performance. 

To validate the model, 100 trajectories of the same length as the actual sample 
are simulated according to the model using the procedure described below. A range 
of relevant functions can then be used to compare the simulations and actual data: 
moments, daily distributions, quantiles, GEV parameters for the maxima of Y,, cold 
snaps, and heat waves. 


Simulation procedure. 


e Preprocessing: estimate the trends (m,,s5,) and seasonal effects (u,, v,) in the 
mean and standard deviation of the original time series X;. 

Calculate Ÿ, = (X — à — m)/($Ÿ) and apply model (10.14) to Y;,. 

Estimate a(y) and c(y). 

Use the estimators 4(y) and €(y) to construct 100 simulated trajectories Y x,. 
Obtain 100 simulated trajectories of X using the model: 


X: = SVZ *, HM; + Ur. 


With regard to daily temperature distributions, 365 Kolmogorov—Smirnov tests 
made it possible to test the homogeneity of the observed and simulated distributions. 
This led to, for both measurement stations, not rejecting homogeneity at a 5% con- 
fidence level. Moreover, the simulations of the daily means and variances represent 
well the characteristics of the observations; the results in Fig. 10.1 are for the mini- 
mum daily temperature in Fruholmen: mean values and the 2.5% and 97.5% quantiles 
of the estimators provide the lines for the mean estimator and its 95% confidence 
interval. 
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Fig. 10.2 Comparison between the daily skewness (top) and kurtosis (bottom) of the data recorded 
at Fruholmen and simulated data (95% confidence interval). The actual data are plotted as a solid 
line 


As for the skewness* and kurtosis* of the daily distributions, the results are not as 
good as for the daily means and variances. We see in Fig. 10.2 (minimum temperatures 
in Fruholmen) that the indicators for the 42nd and 355th days are quite low or high. 
For these days, indeed, extremely low temperatures were recorded: —21.6°C and 
—24.5°C. 

In order to confirm the usefulness of the model, simulations from it can be com- 
pared to those from simpler models, e.g., ones with a constant function a(y) or a 
trigonometric function a?(t) which depends only on time and not the state y,_1. 

The example simulating the minimum daily temperature in Berlin (Fig. 10.5) 
presents the model’s performance for most of the quantiles with respect to the other 
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models. The extreme quantiles seem to be well represented by our model since 
the observed quantiles are located within the confidence interval of the simulations 
defined for a corresponding level. 

Compared with other temperature generators in the literature, this model differs 
in that it is bounded. This a priori feature leads to better reproduction of the extreme 
values. To see whether the obtained trajectories have good extreme value charac- 
teristics, the parameters of the GEV distribution can be estimated from the maxima 
and minima of the residual series of simulated trajectories. The results (illustrated 
in Fig. 10.3) show that the shape parameter & is better estimated than the position u 
and dispersion o, not forgetting that the estimation of extreme value parameters is 
extremely sensitive to the block size. 

We can also evaluate the capacity of these models to reproduce cold snaps and 
heat waves. The cold snaps considered here are defined as a set of consecutive days 
during which the daily minimal temperature is less than the 2% quantile of the 
daily minimal temperature distribution, and heat waves as sets of consecutive days 
whose maximal daily temperature is above the 98% quantile of the daily maximal 
temperature distribution. The length of such episodes essentially varies from 1 to 
15 days, with a small number lasting longer than 15 days. In practice, the 2% and 
98% quantiles are calculated using the actual time series data, and the frequency of 
periods of all lengths are compared to the minimal, maximal, and mean frequencies 
(for each length) obtained from the 100 simulated trajectories. The results are good 
in general; while the model tends to overestimate the frequency of periods lasting 
one day only, it is able to reproduce longer periods with reasonable frequencies with 
respect to the observed ones. Figure 10.4 illustrates this on cold snaps in Berlin and 
heat waves in Death Valley. Return levels can also be used as validation criteria, but 
we do not detail this further here—see [595] for a discussion about this (Fig. 10.5). 


10.2.7 Case Study 2: Extreme Values of a Frost Index 


The frost index characterizes the strength of a freezing episode. Such episodes are 
defined as sets of consecutive days on which the daily mean temperature remains 
below 0°C. The frost index then corresponds to the absolute value of the temperature 
gap with respect to 0°C over an episode [754]. 

Here, we look at the daily mean temperature recorded in Colmar, France from 
1957 to 2012 (56 years). During this time, 283 frost indices were recorded—on 
average 5.1 per year, which makes applying extreme value theory unsuitable. The 
largest observed value of the frost index was 418.2 during the winter of 1962-1963, 
and the second largest was 189.6 during the 1985-1986 winter. In order to estimate 
return levels for frost indices, it may therefore be useful to stochastically simulate 
the daily mean temperature. 

By simulating the daily temperatures over a 56-year period 1000 times with the 
model calibrated to the observed data, we obtain a total of 291,446 frost indices, i.e., 
5.2 per year on average, which is consistent with the 5.1 observed in the real data. 
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Fig. 10.3 Extreme value parameters estimated using the observed residual time series (vertical 
lines) and their empirical distributions constructed using the minimal daily temperature simulations 
for Berlin 


Among these indices, 11,449 are above 100, i.e., a proportion of 0.039, in comparison 
to the observed proportion of 0.042 (12 in 56 years). A comparison of the numbers 
of simulated indices per year and the annual maxima in Fig. 10.6 also supports the 
relevance of the model, with 3 observed indices found outside the 90% confidence 
interval of the simulated ones, compared to the number expected: 2.8 (5% above and 
5% below). 

It is then possible to estimate a return level for frost indices in two ways: 


e Estimate the corresponding quantile of the distribution of 291,446 simulated 
indices; i.e., we look for the level exceeded on average once every N years, which 
corresponds to the quantile 1 — 1/(np - N), where n, is the mean number of indices 
per year of this distribution. For a return level of 1000 years, this approach leads 
to a value of 391 and the 95% confidence interval [376.2, 404.5]. 

e Apply extreme value statistics theory by fitting a GEV distribution to the 1000 
maxima of each 56-year long simulation (the maximum is then taken on average 
over 56 x 5 = 280 values, which seems more reasonable). For a return level of 
1000 years, this approach gives a value of 386.2 and a 95% confidence interval of 
(376.7, 395.7]. 
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Fig. 10.4 Observed frequencies (solid line) and maximal, minimal, and mean frequencies (dotted 
lines) obtained from simulating cold snaps in Berlin (top) and heat waves in Death Valley (bottom) 
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Fig. 10.5 Observed quantiles (vertical lines) of the daily minimal temperature in Berlin and their 
respective empirical distribution constructed using different models: constant a(y) (dotted lines), 
trigonometric a? = f(t) (dashed lines), and Model (10.14) (solid lines) 
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Fig. 10.6 Number of frost indices (top) and maximal index (bottom) per year recorded in Colmar 
(solid line) with a 90% confidence interval provided from simulations (solid line—center), and the 
simulated maxima (dotted line) 
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10.3 Physics-Based Simulation of Natural Hazards 


10.3.1 Computer Models 


The obvious lack of data on natural hazards, and sometimes the difficulty to get a fine 
description of them through stochastic processes, has pushed the search for an alter- 
native way to both statistical and stochastic direct modelings. An indirect approach 
is the following: stochastic modeling of the most uncertain yet influential param- 
eters y (input parameters) in the dynamic equations leading to the appearance of 
extreme events, if such dynamics equations can be simulated. These equations corre- 
spond to deterministic, discrete, or continuous modeling of events [331], and we call 
computer implementation of such models numerical modeling, computer modeling, 
or computer codes. An important review of dynamic models, illustrated by exam- 
ples from climatology, seismology, and socio-economics, can be found in the article 
[331]. The use of such models to characterize potential causes and consequences 
of extreme natural events such as earthquakes, tsunamis, severe storms, avalanches, 
and even meteorite strikes is discussed in [13] (Sect. 5.10). In particular, hydrological 
(rainfall) forecasts generally benefit greatly from such tools [323, 447, 536]. 

Let us return to the example in the introduction of this chapter. The flow of a 
watercourse can be understood as a calculable variable whose values are produced 
by an algorithm that implements a structured set of equations X = g(@) from free 
surface fluid dynamics [27, 37]. The upstream flow, surface roughness of the sub- 
merged earth in a section of the watercourse [120], and the parameters describing the 
geometry of the river along the section in question ([785], Sect. 2.17) form part of the 
explanatory parameters ¢ for the downstream flow. The approximate resolution of 
fluid dynamics equations—with which the watercourse is approximated as a curvi- 
linear abscissa [120] or as a more precise multidimensional mesh [213]—necessarily 
implies model uncertainty with respect to g, to which is added uncertainty in the value 
of influential parameters [559]. For instance, the measured flow generally comes with 
a non-negligible measurement error [218]; also, the geometry is summarized using 
a small number of parameters [548]; finally, there are no friction measures that can 
help to assess the corresponding input parameters. More generally, flow and fric- 
tion are variables subject to a mixture of epistemic and random uncertainty (see 
[393] and Sect. 3.1.2). The epistemic part is linked to the fact that our knowledge of 
physics—corresponding to the level of model sophistication—improves over time. 
The intrinsic variability in the value of the parameters at any physical point is a 
signal of random uncertainty. Note that the theoretical model g is itself subject to 
a epistemic-type error with respect to the reality of the phenomenon X, while its 
numerical version inevitably adds approximations and (supposedly under control) 
rounding errors. 

The representation of epistemic and random uncertainty with the same tools 
(stochastic processes or probability distributions) remains controversial [278, 392, 
568, 671]. A rapid review of methodological principles alternative to probability- 
based ones can be found in [807]. For practical reasons [658] but also epistemological 
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ones ([240, 712] and Sect.3.1.2), the joint representation of uncertainties through 
probability tools remains the predominant approaches in risk analyses. This is evi- 
denced by the most recent attempts to formalize procedures for the exploration and 
use of complex computer codes g that model physical phenomena and whose entries 
¢ contain uncertainty (so-called verification, validation, and uncertainty quantifica- 
tion approaches [13]). This predominance concerns in particular most of the studies 
taking place under regulatory frameworks [531]. The probabilistic modeling of influ- 
ential parameters ¢ thus makes it possible to simulate the variable of interest X, find 
values for @ which lead to extreme values in X, then calculate risk indices. 

Studies seeking to treat epistemic uncertainty differently, through the use of mod- 
eling derived from non-probability-based theories,” are still essentially research top- 
ics and not fully accepted methodologies that can become predominant today in 
regulatory settings. The main reason is computational complexity, which can rapidly 
become prohibitive with respect to the dimension of ¢ [753, 769]. Let us however 
mention the article [663] which studies the feasibility of the possibilistic approach 
[231] to represent a lack of knowledge of the explanatory parameters of earthquakes, 
[458] which estimates extreme sea levels with the same approach, and also [604] 
which considers a case study similar to that of Example 10.2 (further in the text) 
and proposes to dimension dikes with the joint use of probabilistic and possibilistic 
tools. This type of approach has been gaining ground in risk studies since the turn of 
the XXIst century [61, 368]. Nevertheless, formally speaking, these methods do not 
make it possible to define typical notions in natural hazard management like return 
levels and periods. Rather, they provide bounds in extreme value situations, given as 
solutions to deterministic problems—often in a quite conservative manner [238]. 


Example 10.1 (Predicting tsunamis caused by meteorite impact) 


Simplistic, pseudo-physical models can also be used to evaluate the probability 
(or return period) of an extreme-valued event. The prediction of characteristics 
of a tsunami caused by a meteorite impact is a prime example. The methods 
proposed in [583, 775] involve products of probabilities of occurrence 


P( meteorite impact during [to, 1]) x P( meteorite size > 100m) x... (10.15) 


with physics-based impact models. For example, [583] has proposed modify- 
ing (conservatively) a model that correctly predicted the consequences of the 
impact of the Shoemaker-Levy 9 comet on Jupiter in 1994. The two compo- 
nents of this type of study are subject to extremely high epistemic uncertainty. 
Thus, approaches like (10.15) rely on mechanisms similar to probability safety 
assessments (PSA, [775]) which are used to rank extreme accident scenarios in 
industry (nuclear in particular), rather than quantifying them precisely [237]. 
Furthermore, the error in modeling such impacts obviously remains very diffi- 


2 Often called extra-probabilistic theories. 
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cult to estimate in the (thankful) absence of observed data. Nevertheless, such 
models are the only ones available today, and at minimum a stochastic explo- 
ration of them can help compare this type of extreme event with other more 
frequent ones, in terms of risk (cost x probability). 


10.3.2 Monte Carlo-Based Exploration and Extreme Risk 
Calculation 


The most general approach available for estimating extreme event occurrence prob- 
abilities or the corresponding return levels—using black box numerical models—is 
based on Monte Carlo sampling [673]. The following example introduces this explo- 
ration principle (see also Fig. 10.8) (Fig. 10.7). 


Example 10.2 (Flood probability and return period) 

Suppose that X represents a water level (typically obtained through a transforma- 
tion of the flow using a calibration curve—see Example 4.4), and that there exists a 
deterministic hydraulic model g such that X = g(@), where ¢ is random with den- 
sity f on ®, and dim ® = d. Let xo be the height of a dike. If we suppose that g is 
valid for representing X in some domain x = g(®), and that xo € x, then the flood 


> 


Fig. 10.7 Two-dimensional illustration: the probability p of an extreme event can be interpreted 
as the volume (i.e., the area in two dimensions) of a small grey zone. A numerical experimental 
design (black dots) requires to surround the surface of this zone in order to correctly estimate its 
volume 
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(A 


Fig. 10.8 Under monotonic constraints on g, it is possible to bound the boundary g(p) = xo by two 
hyper-stairs [115, 659]. The two hatched volumes then allow two sure bounds on the probability p 
(grey area) to be determined. The same reasoning can be used to bound a quantile z [117] 


probability (overflow) is given by: 
p = P(X > x) = f Ligo f ($) do, (10.16) 
® 


and can be estimated using a Monte Carlo method: Pm “ p, where 
1 m 
Pn = — D ligozu) (10.17) 
k=1 


and {¢1,..., Ọm} Bs f, with m > 1. Estimator (10.17) is unbiased and is such that 


MO (yp) > NO. (10.18) 


where the variance o? = p(1 — p)/n is estimated by 62 = ph(1 — pm)/(m — 1). 
An asymptotic confidence interval for (10.17) is then 


Ôm Ôm 
vm vm] 


If conversely we are looking for the 1/p return level for X, denoted z, then 


, Pm + D (1 — «/2) 


E = old = a /2) 
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z=inf{xe x P(g) < x) > 1- po} (10.19) 
can be consistently estimated by 
Êm = Xin c—po) +1? (10.20) 
where X; = g(x), and Xj, ..., X% is the order statistic of X1,..., Xm [44]. When 
the distribution fy of X = g(@) is differentiable at Z,,, the bias of the estimator Z,, 
is: 


poll — po) fx (2) 
2(m + 2) F2) 


Elm] L= T O(1/m°), 


which becomes negligible when m — oo, and is asymptotically Gaussian [199]: 


a 2 po(l — po) 
v/m (2m — z) ma N (o. Cot) à (10.21) 


As the distribution fx remains unknown, it is necessary to estimate it using sampling 
in order to estimate the variance of Z,,, and in particular to estimate it well in its tail 
x ~ z, Which requires a large sample size. 

An alternative approach to (10.20) is quantile regression [126, 441], which in gen- 
eral supposes that z is a linear function of the vector of variables D, with coefficients 
B. We define, theoretically: 


B = arg min E [p(X — $"b)], (10.22) 


where p(y) = (1 — po — 1;,<0}). An estimator of the coefficients B can be obtained 
by minimizing the empirical estimator of the objective function in (10.22) produced 
by Monte Carlo sampling. However, this approach has for the most part been reserved 
for real experimental designs (not numerical ones) and the evolution of quantiles 
along trajectories measured in situ [794]. We also cite [134], which suggests applying 
this approach to calculating extreme sea levels. 


10.3.3 Robustness and Calculation Time 


Exploration of the model g (or rather its encoded version) by simulating the param- 
eter vector p ~ f is the most-used approach to estimate the quantities of interest in 
Example 10.2. The aim is to have a minimum of hypotheses on g to ensure as gen- 
eral as possible results (non-intrusive approaches). In the absence of any regularity 
hypothesis or even the assumption of continuity, g can be seen as a “black box”, and 
the Monte Carlo approach described in Example 10.2 gives powerful results whose 
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precision is independent of the number of dimensions d. This wish for a minimum of 
hypotheses results most often from the fact that g models a chain of several physical 
processes, increasing the risk of edge effects (cf. Sect. 10.1). 

However, this standard approach comes with a major weakness, in practice, when 
p <1. In such cases a Monte Carlo approach is not robust [341], in the sense that 
the estimation error on Pm, given in terms of its coefficient of variation: 


tends to infinity when p — 0 for fixed m. Obtaining a typical precision of p ~ 1074 
requires at least m = 101+? calls to g. This need for a large number of calls also 
affects the estimation of the quantile (10.20). However, continual improvement in 
the precision of numerical models leads to an increased calculation cost for each run 
of g, which can make it prohibitive to calculate the estimators p,, and Z,,. This is 
particularly true for climate models [633]. 

Computing time and memory cost problems can be greatly diminished by per- 
forming parallel calculations (in particular for climate models [495, 514]) or using 
supercomputers; see, for example, [756] for applications involving heavy rainfall due 
to planetary oscillations, earthquakes, and tsunamis. Moreover, a number of accel- 
erated Monte Carlo methods [672] can be used to significantly decrease calculation 
costs by numerically “zooming” into a zone D’ Ç ® where P (g(@) > xolp € ’) 
is a non-extreme probability. A very brief summary of such approaches usable by 
engineers is presented later in this section. As this area of research has been rapidly 
evolving since the year 2000, readers interested in a more comprehensive take on the 
subject should consult [672]. 


10.3.3.1 Conservative Estimates of Probabilities and Return Levels 


Starting from a Monte Carlo sample, Wilks [787], later extended by [773], proposed 
an empirical estimate of a return level z, described in (10.19) as an order 1 — po 
quantile, though with high computation costs. This is a conservative approach to 
calculating the quantile (10.19) in the sense that, with high probability 1 — 6 > 0, it 
does not underestimate the risk when the model g has to be explored using numerical 
trials. It is therefore an important element in certain safety demonstrations in the 
structural reliability field [564, 806], and can be used to help mitigate certain extreme- 
valued natural phenomena. 

From (10.21), we can deduce that asymptotically the estimator Z,, satisfies 
Pm >) > 1/2. To increase this probability to 1 — B, we can replace the calcu- 
lation (10.20) by 


Êm = Xim(1—po)|4r? (10.23) 
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where the integer r > 1 can be selected optimally as the smallest positive integer ro 
for which [564, 787] 


mpo-r 


Cam n=}, (k) eb = po" < B. 


k=0 


The estimator (10.23) defined by r = ro is called the Wilks estimator, and satisfies 


This estimator is defined for any sample size m above or equal to some minimum 
sample size m, defined by the limit case: 


log B 
esia e al 
j + m = — | 


To give an example, for po = 1 — 6 = 5% a minimum value of m. = 59 runs is 
required to obtain an upper bound for the quantile or return period z. 

Several improvements have been made to this approach to help bound quantiles 
and increase the “strength” 1 — B of the conservatism; these can be found notably in 
[271, 722]. Note also that it can be shown that a conservative estimator of a quantile 
z obtained using an empirical bootstrap is similar to that of the Wilks estimator. 
Practical tips for using this approach are provided in [20] (Sect. 4.3.22). However, 
remember that this conservatism is not certain, and the approach can also strongly 
overestimate the risk when the size of the sampling design is highly constrained, 
since this estimator suffers from high dispersion [405]. 

A more sophisticated approach is the monotonic reliability method (MRM), 
which provides both statistical estimation (of the accelerated Monte Carlo type—see 
Sect. 10.3.4) and bounding of a certain probability or quantile, and works by taking 
into account a monotonic constraint on g. To continue with Example 10.2, it may 
be realistic, or conservative, to suppose that the true water level (and its model g) 
increases when the flow and/or friction does. If so, the regions ® close to the limit 
state surface {p € D, g(d) ~ xo} can be quickly contoured using two numerical 
experiment designs. The measure of the space enclosed by these two designs can 
be used to bound a probability or quantile. This approach is quite well developed 
for estimating and bounding a given probability p [115, 119, 533, 659], and much 
more robust and rapid than Monte Carlo—though restricted to low dimensions (less 
than 10). Adapting it to be used to estimate and bound quantiles is an active area of 
research [117]. 
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10.3.4 Accelerated Monte Carlo Methods 


Accelerated Monte Carlo methods broadly encompass all probability (10.16) and 
quantile (10.19) estimation methods based on stochastic or deterministic sampling 
of the space ® which are more efficient than classical Monte Carlo ones. What “effi- 
ciency” means depends on the context. When sampling is stochastic, it principally 
refers to the rate of convergence of the estimator to a Gaussian distribution (i.e., the 
inverse of its asymptotic variance) and its robustness as defined previously. When 
sampling is deterministic, the notion of discrepancy plays the role the rate of conver- 
gence does in the probabilistic setting. In general, efficiency is linked to the number 
of calls to g required to obtain a certain estimation precision. In this section, we 
provide a rapid overview of the major accelerated Monte Carlo methods that can be 
used when dealing with extreme probabilities and quantiles. Readers interested in 
more technical details should turn to the book [672] and review article [525]. 


10.3.4.1 Improving Numerical Experiment Designs 


A first type of acceleration involves modifying the sample (or numerical experiment 
design) U1,..., Um + Hoi, whereby {h1,..., Dm} = {F-!(uy,...,Um)}, where 
F(.,...,.)is the joint cumulative distribution function of p. Figure 10.9 (left) illus- 
trates a well-known phenomenon in standard Monte Carlo sampling: due to the 
necessarily finite sample size, certain zones are not filled—unlike with a regular 
grid (right), though the latter comes with an exponential computational cost in g 
that scales with the number of dimensions d. Several classes of methods have been 
developed to carry out a more complete exploration of a space ® (or [0, 1]“) under 
the constraint of at most m runs of g. Such methods are generally called space filling 
designs [193], and can also be separated into deterministic and stochastic approaches. 
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Fig. 10.9 Standard Monte Carlo sampling on [0, 1]? (left) and a regular 2-dimensional grid (right) 
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Fig. 10.10 Example of two Halton sequences in dimension 2 


Deterministic approaches. These approaches involve constructing a deterministic 
sampling of the space ® (or [0, 1]“) using sequences with uniform properties, then 
calculating pseudo-statistical estimators using classical formulas (10.16) and (10.19), 
and deterministic error bounds [476]. Going under the name of quasi-Monte Carlo 
methods, these sequences u1, . . . , Um (in the space [0, 1]“) can be generated in a way 
that optimizes a deviation from uniformity criterion called L?-norm discrepancy, 
which is of the form: 


m 


2 
1 
D(ui,..., Um) = f l > 1u;<{0,y1) — Volume([0, 2) dy, 
where 0 is the origin (0,...,0) of the space [0,1], and [0, y] = [0, yı] 
x... X [0, yg]. This criterion compares the volume of all of the sub-intervals 
(anchored at the origin) of the domain of the u; with the proportion of points con- 
tained in each. Robust choices for discrepancy and anchoring with respect to inputs 
as the number of dimensions decreases were studied in [193]. To obtain faster esti- 
mators (less costly in the number of calls to g) than Monte Carlo, low-discrepancy 
sequences are of fundamental importance. In effect, for such sequences, the rate of 
convergence when estimating a probability (10.16) is (logm)“/m, better than the 
Monte Carlo one of 1/,/m. In practice, Halton [373], Hammersley [415], Faure 
[268], Sobol [711], or Niederreiter [558] sequences are usually used, the latter two 
generally considered the best in terms of robustness with respect to increasing the 
number of dimensions (Sobol) or asymptotic properties (Niederreiter), especially 
when they are associated with what are known as scrambling techniques [582]. An 
illustration of two Halton sequences is provided on Fig. 10.10. 

Concepts other than deviation from uniformity using the discrepancy have been 
subsequently considered; these define useful properties for numerical experiment 
design. For instance, [416] has proposed the construction of space filling experiment 
designs which satisfy one or the other of the following two properties: 
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Fig. 10.11 Latin hypercube sampling with m = 5 (left) and m = 10 (right) on [0, 17 


e maximin designs: the minimal distance between two points of the design is maxi- 
mized, and the number of points exactly the minimum distance apart is minimized, 
e minimax designs: the maximal distance between two points of the design is mini- 
mized, and the number of points exactly the maximal distance apart is minimized. 


Such designs are generally non-unique and non-deterministic, and the selection of 
points needs to be performed numerically, e.g., using a simulated annealing algo- 
rithm. Minimax designs should generally be used in practice as they tend to better 
fill spaces, whereas maximin ones generally fill the boundaries better than the center, 
especially in high dimensions. However, numerous technical difficulties are involved 
in constructing minimax designs [629], and maximin ones are still frequently used. 
A recent overview of the construction and properties of these types of experiment 
design is provided in [628]. 


Stochastic approaches. The Latin Hypercube sampling (LHS) method was histori- 
cally one of the first stochastic simulation methods to improve the filling of the space 
® (or [0, 1]“) with respect to classical uniform simulation. The idea is to partition the 
interval for each input variable X% into N segments of equal probability, then uni- 
formly draw from each of these intervals (see Fig. 10.11 for an illustration in 2 dimen- 
sions). Sampling is then produced by combining the m = N4 draws while supposing 
that the input variables are independent [509, 721]. LHS has since been extended to 
treat dependent cases [404, 611], work with non-hypercubic domains, and incorpo- 
rate distance constraints between points generated using maximin approaches [49, 
511]. In summary, this method ensures uniform coverage of the domain of each input, 
and leads to faster overall convergence of classical statistical estimators [385]. 

The good uniform coverage properties of LHS points are nevertheless marginal, 
in the sense that correlations between the dimensions of X are not generally taken 
into account. To do so, orthogonal tables [381] can be used, which allow points 
to be distributed in subspaces of ® (or [0, 1]¢). A combination of the best space 
exploration features of both is also possible [476]. 
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The statistical estimation of probabilities g or quantiles z by a simple static 
improvement to numerical experimental designs—using a better uniform distribution 
of points in the space [0, 1]¢ and before inverting using F(.,..., .)—is not, how- 
ever, entirely satisfying. Indeed, these experimental designs are constructed using 
dependent values, which is of little (or no) help in formally proving consistency and 
convergence of the statistical estimators Pm and Z,, to their true values p and z since 
asymptotic results like Theorem 4.5 and bootstrap techniques are no longer valid. 
The work of Nakayama [224, 544] has nevertheless led to significant progress in 
adapting these theorems when estimating quantiles using optimal sampling. How- 
ever, in general (and in particular in extreme value settings [336]) it is necessary to 
construct new statistical estimators associated with sampling designs that are both 
stochastic and progressive (also called sequential or iterative), that improve an orig- 
inal experimental design, for hoping to improve on the Monte Carlo approach [352]. 


10.3.4.2 Adaptated Statistical Estimators 


A full overview of techniques for constructing statistical estimators with less variance 
than those using Monte Carlo sampling is given in [476, 673]. From the 1990s on, 
research at EDF [91, 115, 336, 533, 537], CEA [137], INRIA [148, 149, 367], 
ONERA [524, 527], the Ecole Supérieure d’Électricité (Supélec) [63, 64], and the 
University of Sheffield [566], have led to several non-intrusive approaches well- 
suited to attaining small probabilities and extreme quantiles. 

These methods generally share certain useful ideas such as control variables, 
stratification, directional simulation, preferential (or importance) sampling, multi- 
level sampling, and the selection of criteria to optimize when choosing the next point 
in progressive sampling. In the following paragraphs, we rapidly summarize these 
ideas. Other classical variance-reduction techniques like conditional Monte Carlo 
sampling and the antithetic variables method are not considered to be sufficient for 
attaining small probabilities and extreme quantiles [336]. 


Meta-modeling. We begin by introducing the idea that sampling can be efficiently 
guided using meta-models, also known as response surfaces [20], the name given to 
any function g which “mimics” the behavior of g over a domain ®; € © of interest 
but with much lower (negligible) computational time. Constructed (estimated) from 
an initial experimental design obtained using tools from Sect. 10.3.4.1, and possibly 
refined after each improvement to it, this may be, in general: 


e adeterministic interpolator: linear, quadratic, polynomial (e.g., spline), or logistic 
regression model, neural network, etc. 

e a stochastic (or kernel) interpolator: e.g., Gaussian-process based kriging [680] 
derived from geostatistics, or chaos polynomials (or a combination of both [436]). 
Such interpolators are based on the hypothesis that the value x = g(@) can be 


3 More generally, work performed by the CNRS Mascot-Num research group (http://www.gdr- 
mascotnum.fr/). 
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considered as the output of a stochastic process, with the goal of better taking into 
account, when using g in place of g, the meta-modeling error, and deduce zones 
of ©; to preferentially explore when calculating the quantities of interest q and z. 


Readers interested in knowing more about the extensive world of meta-models can 
refer to numerous recent publications; in particular we recommend [439] and the 
references given in [20]. 


Control variates. For calculating the probability (10.16), the control variates method 
[483] consists of introducing an auxiliary variable £() such that E[£(d)] = £o: 


P = E [lios] = E [ligos — &)] éo, 


estimated via Monte Carlo sampling using: 


m 


Pme = — J [ligos — EO} + £o, 


m 
k=1 


jpa 


where 


1 
VIPme] = m [Pa — p) + VIE@)] — 2Cov {1go E(D)}] : 


If &(@) is strongly and positively correlated with the indicator function 119(¢)<x9}, 
the covariance term becomes negative and leads to a decrease in the variance o? of 
the usual Monte Carlo estimator. This approach can also be modified to calculate 
quantiles, using a meta-model as the control variate [388]. 


Stratification. 
The idea behind stratified sampling is to partition the domain ® into disjoint sub- 
domains known as strata (or layers): 


M 
&=|J%; such that D; ( )j = Ø when i Hj. 


i=1 


The integral in (10.16) is then calculated in each stratum, and weighted by volume 


où = fo, f (6) dd: 


M 
_ f looz 
[ Lig)>x01 f(b) do = > oi Í E ee 


i=1 ! 


M 1 Mi 
= D Pi Ma M; 2 Liggi) zx) 
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where M; is the number of Monte Carlo draws @1:,...,®m,: in stratum i, with 
y M; = M. This method helps, for example, to ensure that draws occur in the 
tails of distributions. The number of draws M; in each stratum, as well as the number 
of strata, can be optimized with respect to the desired precision of the estimator 


M 0; M; 
=> K Yo logmnzw 
k=1 


and the postulated order of magnitude of p. This technique, which does not guarantee 
a reduction in variance [336], is generally combined with directional simulation (see 
below) to produce powerful estimators of the probability p [537]. Coupled with the 
use of a meta-model to guide the choice of partition, it can also be used to estimate 
extreme quantiles [137]. 


Directional simulation. This method involves transforming the physical space ® 
into a standard Gaussian one using an isoprobabilistic transform y = Y o F~!(@) 
(where Y is the joint cumulative distribution function of a standardized multivari- 
ate Gaussian distribution), randomly generating directions in this space, estimating 
probabilities p conditional on these directions, then averaging these estimates to 
give an unconditional estimator of p. Though limited in the number of dimensions 
it can handle, this approach is useful for performing an initial exploration of a space 
and detecting influential variables using sensitivity analysis [407]. The idea of trans- 
forming the space was original used in older reliability methods (FORM-SORM* 
methods), which turn the probability estimation problem into an optimization one but 
offer little or no information on the estimation error involved [475]. Requiring very 
little computation time, but not very robust with respect to the number of dimensions, 
these methods can also be adapted to iteratively look for quantiles of interest [215, 
610]. 


Preferential or importance sampling (IS). The principle of IS is to change the draw 
density f(@) to f7s(@) and introduce a ratio of densities into the statistical estimator 
of the quantity of interest. The idea is to concentrate the random draws in an area 
of interest, i.e., one that corresponds to extreme situations X. Under non-restrictive 
conditions on the support, the estimator (10.17) is replaced by 


m 


Pis = 2 L {g(@x)Zxo}- (10.24) 


An optimal choice for frs is theoretically possible [672]: 


L{9@)zxo) f (D) 


10.25 
Lis) Je Lowa FO i ) 


4 First and second order reliability methods [475]. 
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characterized by a null variance, but this is unattainable in practice as it corresponds 
to knowing how to explicitly solve the problem (i.e., knowing the domain of @ 
where g(@) > xo). Nevertheless, this result is a fundamental one because almost 
all modern approaches based on IS look to dynamically construct? a succession of 
approximations to the ideal distribution (10.25). This can typically be done: 


e by learning about g, viaa meta-model [51, 56, 232], while reducing the dimensions 
of the problem to the most influential components of ¢ [538]; 

by roughly yet rapidly “zooming” in to the zone of ® close to the border g(@) > xo, 
using, for example, FORM-SORM approaches [295, 409]; 

by learning directly about frs, e.g., using a mixture of kernels whose weights are 
updated [139, 183], or in an entirely nonparametric way [599]. 


The inversion of the cumulative distribution function ¥ of X required in (10.19) can 
also benefit from (10.24) by defining the estimator 


m 


(pi) 
Fi 5(x) = — oles oa, (a(x)=0}> 


and the estimation of a quantile z can be achieved using classical [340, 388] or adap- 
tative [246, 523] IS. In particular, several strategies involving adaptative importance 
functions were tested by [599]. The most recent results use a mix of meta-modeling 
and adaptative IS [532, 691]. The incremental part of the adaptative methods can 
notably be performed by taking advantage of statistical learning tools (it is then called 
active learning [691]). 


Example 10.3 (Prediction of climate extremes) 


A promising use of adaptative importance sampling applied to temperature 
trajectories derived from climate models, with random initial conditions, is 
based on iterative resampling of importance weights in order to select periods 
with extreme temperatures [633]. In doing so, these authors are capable of 
parallel simulation of rare (or never observed) heatwaves over periods of 90 
days or more with the help of the principle of large deviations [745] which 
specifies the way in which the extreme values of an unknown probability 
distribution behave, in order to estimate extreme return levels (associated with 
probabilities of between 107% and 1077 per year). The “zoom” enabled by 
importance sampling drastically decreases the required number of calls to the 
climate model (Planet Simulator [287]) by a factor of 100-1000. 


5 Namely, during the numerical exploration of g, through a progressively enriching experimental 
design. 
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Multilevel sampling. Multilevel sampling, often called subset simulation [48] or 
importance splitting [147, 149, 472, 526], is based on the principle of expressing 
the quantity to be estimated as a function of a decreasing (in size) sequence of 
nested subsets containing the desired domain. The most illuminating example of this 
involves the estimation of the exceedance probability p = P(X > xo). This can be 
rewritten as 


M 
p = | [PX € DIX € Di), 


i=l 
where Dy = {x € x, x > xo}, Do = x, and Vi € {1,..., M — 1}, 
D; = {x € X, x = ei}, 
With &1 <...€m-1 < xo. If the size of the subsets D; (1.e., the values of &;) are well 


chosen, each conditional probability P(X € D;|X € D;_1) can be estimated using 
a (perhaps accelerated) Monte Carlo method, simulating according to the density 


fx@) lien, 


Fx, = BY e Dp 


(10.26) 


This simulation can be performed using Markov chains (Sect. 4.1.5): Markov chain 
Monte Carlo (MCMC). The most common implementation of this method is based 
on the acceptance-rejection Metropolis-Hastings algorithm, which is recalled in 
Chap. 11 (in a Bayesian context). This type of algorithm was implemented by [48] 
and has undergone numerous improvements since (see, for example, [367, 524] for 
a recent history). 

These improvements are essentially based on the construction of the Markov 
chains themselves in ways which guarantee accelerated convergence to each target 
distribution (10.26), as well as the construction of better estimators of each probabil- 
ity P(X € D;|X € D;_;). However, the main difficulty posed by the use of MCMC 
is weak control over the number of calls required to g. For this reason, several authors 
like [64] have proposed a modification to these methods based on meta-models which 
have low computation costs, under the name Bayesian subset simulation. Multilevel 
sampling methods perform optimization by favoring an adaptative selection of thresh- 
olds, choosing them as fixed-order quantiles. The number of steps M then becomes 
random. 

Finally, these techniques can be modified in order to estimate quantiles z with the 
help of an efficient choice of subsets D;, also constructed with adaptative thresholding 
[527, 599]. 
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10.3.4.3 Conclusions on the Use of Methods and Tools 


The use of calculation codes in place of insufficient real or reconstructed data is an 
engineering subject all to itself, more often known as treatment of uncertainties in 
numerical models. The search for physically extreme situations requires numerical 
“zooming” tools that take into account different aspects of the problem: the math- 
ematical dimension itself (i.e., the number of model inputs that are considered to 
be stochastic), the computational time (which generally increases as the algorithmic 
model implemented in the code becomes more complex as knowledge of the physical 
phenomenon improves), the level of extremeness required (i.e., the a priori order of 
magnitude of an exceedance probability, the range of values considered for a return 
level—given as a quantile), and lastly, prior knowledge of part of the phenomenon’s 
physics, notably whether is it smooth or not. 

Lacks of smoothness generally come in the form of edge effects. A validated code 
(cf. Sect. 10.3.6) must be able to reproduce any loss of continuity in the physical 
phenomenon. A loss of differentiability, weaker than that of continuity, also needs 
to be taken into account. An example of non-locally differentiable physics is that of 
the water level of a watercourse, which when reaching the level of a protective dike, 
overflows into the flood plain* and thus radically changes in value, due to spreading 
out over a much larger volume than the river bed* (where it usually flows). 

The presence of edge effects is much in evidence at the boundary of a domain 
of validity, precisely in situations bordering a safety domains. The Monte Carlo 
method—by the fact that it requires no hypotheses on the numerical model—remains 
the preferred method here, on the condition that we can overcome computational time 
difficulties, very generally correlated with a good recovery of critical situations. In 
other situations, an additional hypothesis on the physics, possibly a conservative one 
(e.g., monotone outputs with respect to inputs), may allow accelerated methods to 
be used. 

However, the most common hypothesis is regularity or smoothness, which cor- 
responds both a set of differentiability classes and a certain stationarity of the phe- 
nomenon. In particular, the use of meta-models with continuity and differentiability 
features in the place of code requires this hypothesis in practical settings.° This is 
true notably for Gaussian processes (meta-modeling using kriging), which, since the 
start of the century, have become commonly used tools in studies dealing with uncer- 
tainty. The use of classification tools coupled with continuous meta-models can help 
to free ourselves from some regularity hypotheses (e.g., stationarity—see [353]), 
but this area of research is still in its infancy. An important and unresolved problem 
with the increasing use of meta-models is the question of whether they can produce 
confidence intervals which do not underestimate the estimation error. Indeed, true 
confidence intervals must take into account both the error related to the choice of an 
estimator and that of the meta-model. 


6 Not necessarily from a theoretical point of view since the validity of a statistical estimator can 
remain independent of that of a meta-model, which generally helps to guide the numerical design. 
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Table 10.1 Summary of the uses and properties of rare event evaluation methods (from [406]). 
Y/N (yes/no) means that the method can be used (or not) for the given goal or requires (or not) 
hypotheses on the code. The notation (—, ~, +) is a scale representing increasing quality. An empty 
entry means that the goal is attainable but the methods and tools required are still being developed 
or unavailable 


MC QMC Wilks AIS SS F(S)ORM MRM _| GP BSS 
Probability p Y Y N Y Y 
Level of z return, Y Y Y Y Y 
quantile 
Hyp. of N Y N N N 
Adaptive N N N Y Y 
Dimension + a 
Rarity of the — — — ~ 


event 


Computing cost + du 


Error 


Ease of running + 


Inspired by the work of B. Iooss [406], Table 10.1 provides a summary, in terms 
of relative advantages and disadvantages, of the best methods mentioned above for 
calculating exceedance probabilities, which are more or less comparable when esti- 
mating quantiles. The abbreviations used for the methods are as follows: 


MC: Monte Carlo, 

QMC: Quasi-Monte Carlo, 

Wilks: Wilks estimator, 

AIS: Adaptative importance sampling, 

F(S)ORM: First (Second) order reliability methods, 

MRM: Monotonic reliability method, 

GP: Meta-modeling using Gaussian kriging, 

BSS: Multilevel sampling coupled with meta-modeling using Gaussian kriging 
(Bayesian subset sampling). 


Furthermore, Fig. 10.12 shows the settings in which each of the main methods can 
be used in order to help engineers who wish to quickly test one or several of the 
approaches. 


10.3.5 Practical Implementation of Methods 


Monte Carlo, quasi-Monte Carlo, and space filling algorithms have long been 
found in standard statistical software packages (e.g., randtoolbox and lhs 
in R) and in software for analyzing uncertainty (Open TURNS [20]: http://www. 
openturns.org). The research about and industrialization of the great majority of 
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Fig. 10.12 A classification of rare event methods (adapted from [406]) 


the more advanced methods presented above benefit greatly from the dynamism of 
the highly active MASCOT-NUM (http://www.gdr-mascotnum.fr) and SIAM-UQ 
communities. This dynamism is reflected in the rapid availability (principally in 
R and Python) of tools for the construction of refined experimental designs and 
advanced meta-modeling using kriging, accompanied by calculation optimization 
methods and even estimation methods for quantities of interest; we cite in particu- 
lar: DiceDesign, DiceKriging, and DiceOptim [670], resulting from work 
undertaken by the DICE (http://dice.emse.fr/) and then ReDICE (http://www.redice- 
project.org/) consortiums—which are close to MASCOT-NUM. Certain tools from 
the mistral package run the Wilks and MRM approaches for calculating proba- 
bilities and conservative quantiles. 

Chaos polynomials are available for use in the GPC, polychaosbasics, and 
plspolychaos packages associated with sensitivity and regression analysis tools. 
Finally, a few hybrid uncertainty analysis tools, combining probabilities and tools 
for extra-probabilistic theories can be found in the recent HYRISK package. 
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10.3.6 Validation in Extreme Settings 


Validation corresponds to conferring on the model g and its implementation’ a cer- 
tain objectivity in its predictive description of the phenomenon linking ¢ to X. This 
objectivity is derived from a Popperian consensus [450, 622], i.e., results from a 
consensual decision by the protagonists (the scientific community, and beyond that, 
civil society), supported by the publication of repeated testing of the model. A con- 
sensual definition for the validation of a computational model has been suggested by 
the US National Research Council [13] in 2006: 

Validation is the process of determining the degree to which a model is an accurate 
representation of the real world from the perspective of the intended uses of the model. 

This is a response to a concern which appears in numerous reference documents 
in this discipline [38, 39, 539, 540] and internationally shared under the general 
framework of verification, validation and uncertainty quantification (VV &UQ). In 
principle, such repeated trials must allow simulations of X to be compared with 
real observations, and help delimit a domain of validity ®, C ® of the numerical 
model, i.e., the domain covering the ranges of the input parameter values ¢ that leads 
to simulations of X that correspond to real physical situations, with an acceptable 
margin of error. As stated by Chang and Hanna (2004) who built a software dedicated 
to validating computational models [153]: 

(...) according to Oreskes et al. [578], verification and validation of numerical 
models of natural systems are impossible, because natural systems are never closed 
and because model solutions are always non-unique. The random nature of the 
process leads to a certain irreductible inherent uncertainty. Oreskes et al. [578] 
suggest that models can only be confirmed or evaluated by the demonstration of 
good agreement between several sets of observations and predictions. [152]. 

The notion of an acceptable error thus presupposes knowledge—or even perfect 
control—of the experimental conditions and model, and that it is possible to define 
metrics Y(x, x*) that can compare the simulation results x with observations x* 
[567]. Such calculations of metrics correspond to indicators which must be 


e calculable with a reasonable calculation cost (and thus can be produced for large 
volumes of data coming from simulations); 

e interpretable; 

e always well defined and robust (in particular: able to deal with missing data, e.g., 
resulting from interrupted numerical calculations). 


In practice, such indicators may 


1. be visual, typically in the form of QQ-plots or scatter plots, cf. Fig. 10.13; 

2. more formally, correspond to the result of a statistical test [569], which also 
implies being able to set limit values d such that under the (null) hypothesis 
of agreement between the model and reality, we have A(x, x*) < d with high 


7 Supposed verified in the sense that the implemented algorithm correctly translates the model up to 
a controlled error [530]. 
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Fig. 10.13 Illustration of a 
QQ-plot highlighting the 
possible disagreement 
between real and simulated 
data 
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probability 1 — a. However, no test statistic is universally good for validation 
[569], nor is there a universally good way to choose threshold values a. It is thus 
necessary to use several tests and a number of classical thresholds recommended 
by [417]: p-values of 1% or even 0.1% are now considered more acceptable 
than the traditional choice of 5% as they provide a reasonable agreement with 
Bayesian tests (see Chap. 11). 


According to [152, 153], such indicators can be separated into two groups, summa- 
rized on Table 10.2: 


1. weak indicators, which provide a local vision of the validation (e.g., on average, 
the real phenomenon is reproduced, or its direction of variation is coherent with 
that proposed by the numerical model, etc.); 

2. strong indicators, which suggest a better and more general match. A mixture of 
weak indices can produce a strong indicator. 


One of the most robust validation metrics is the area metric dam proposed by 
Ferson et al. [274], defined as the difference in £! norm between the cumulative 
distribution function Fy» of the real data and F, of the simulations of X produced by 


ge): 


re f RG — Feti] du, 
X 
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Table 10.2 The main strong and weak validation indices in the sense of [152, 153]. We denote 


X = (x1,..., Xn) a vectorial output of the model g and x* = (xj, ..., x7) a real observation. Here, 
a is the rank of y; in its sample (i = 1,...,n). The notation y means the arithmetic mean of the 
vector y, and oy = ((y — y)2)!/? 
Weak indicators 

Bias x* —x 

Bias factor x*/x 

Fractional bias 2(x* — x)/(x* + x) 

Geometric bias exp (Ios) — log(x)) 

FAC—a Va<x/x*<1/a} 

(fraction of data) 

— xX * yt 
Pearson’s linear correlation nel 
OxOx* 
(x) — x@))(x*G*) — 0) 
Spearman’s rank correlation 
Ox) Oxx(*) 

Strong indicators 

Mean quadratic error y (x — x*)2 

(£? norm) 

. : (x — x*)? 
Normalized mean quadratic 2z 
error a 
2 Sagi p (x = x*)? 
R° prediction coefficient 1 — eC) 
(x—x*) 
Geometric variance exp | Goes) — ET 
Maximal absolute error imax Iyi — yi 
t=1 ...,78 


(€© norm) 


which can easily be estimated as the area between the empirical cumulative distribu- 
tion functions (Fig. 10.14). This metric has the advantages of being always defined, 
fairly robust, and producing results in the same measurement units as X € x. The 
same authors, faced with difficulties in dealing with multivariate outputs X, have pro- 
posed an aggregation technique called u-pooling which, though it has the advantage 
of providing a visual diagnostic tool, supposes that the comparison indices between 
empirical and simulated distributions have equivalent weights per dimension. We 
instead prefer (for example) recent statistical tests matching observed and simulated 
data which are based on a set of multivariate statistics (known as energy Statistics) 
involving Euclidean distances between data [729, 730]. Such tests are currently some 
of the most powerful available when the dimension of X becomes large [732] and 
are often used by the machine learning community. 

Lastly, we can expect that validation methods will be increasingly influenced by 
the spectacular advances in automatic classification seen in recent years. The starting 
assumption is simple: if a simulation model producing a great quantity of data similar 
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Fig. 10.14 Illustration of the 
empirical estimation of the 
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to real data is “valid”, then any classification of the union of these two types of data 
should not be able to distinguish between them. This suggests that to validate a 
simulation model, it should be possible to confront it with a detection model that can 
test whether the data shown to it are generated or in fact correspond to real data, with 
balanced test results. This is exactly what is known in classification as the adversarial 
principle, which originated in game theory and has become well known due to the 
rise of generative neural networks (generative adversarial networks or GANS) [42, 
347)). 

The difficulty of validation in extreme-value settings is linked to the unlikeliness 
of being able to run a direct comparison between simulation results, known as system 
response quantities in [569], and observed data, since there may be very few of the 
latter available. Automatic classification approaches, requiring usually large quan- 
tities of data, seem to be of little relevance here. In fact, it appears easier to confer 
simulation results de facto validity using an information fusion procedure with the 
real data, than look for a binary response confirming or rejecting calculation outputs. 
This type of procedure, generally based on the use of Bayes rule (see the following 
chapter)—for instance, Bayesian melding [170, 242]—implies a comparison metric 
and the recalibration of one or several uncertain inputs to the numerical code, pos- 
sibly with the help of meta-models [294, 637]. Though Bayesian inference seems 
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to be the most relevant and used methodology for calibrating computer models. the 
difference between (re)calibration and validation in this setting is not always clear 
[192]. 


Remark 10.1 It is also important to pay close attention to the following phe- 
nomenon, which occurs often: certain key parameters of numerical models “pushed” 
into extreme settings are calibrated using data coming for the most part (or entirely) 
from a regular domain, i.e., a non-extreme one. This is the case in particular for limit 
conditions, as illustrated in Example 10.4, which act as hidden, non-probabilistic 
parameters, therefore severely limiting the pertinence of the model when it is used 
close to the boundaries of its domain of validity. 


Example 10.4 (Hydraulic codes) 


Hydraulic codes are typically calculated over a well-defined territory by pro- 
viding a hypothesis on the link between the water height and the downstream 
flow, through a rating curve [223]. This corresponds to a limit condition on 
the fluid mechanics equations modeling the flow [301]. “Downstream” is a 
sufficiently distant place (ideally, the ocean in the case of a river) that we can 
consider this link little affected by changes in the flow domain upstream. Nev- 
ertheless, gauging performed to construction of this curve occurs in standard 
flow conditions. 


Examples of such studies, however, remain rare in extreme value statistics. The 
works [181, 182], which focus on fusing information in order to estimate the fre- 
quency of extreme rainfall in a climate change context, are relevant here. Alternative 
approaches focus on validating the distributions of stochastic inputs of these com- 
putational models, in relation with worst-case scenarios most often defined as the 
values located at the borders of the models’ domain of validity. For instance, [488] 
has proposed using concentration inequalities to bound the probability of rare events 
output from numerical models, whatever the statistical characteristics of the input 
distributions are. However, such approaches are still a matter of debate, as they are 
susceptible to leading to highly conservative return level values. 

At present, the validation of numerical models in extreme settings remains a 
methodological “bet”, fueled by the gradual refinement of the various parts of these 
models” and, in the precise case of climate models, by the increasing attention paid 
to problems involving changes in scale. Validation of extreme situations produced 
by mesoscale climate models (typically square grids with 200km sides), then a 
transfer by downscaling of these situations to regional extreme events using extreme 
value statistics models, is a current area of research [292] whose main constraint is 


8 See, for example, [294, 664] for an illustration on hydrological models for flood forecasting 
purposes. 


° Structural equations, domains of variation of parameters, numerical calculation precision, etc. 
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how to deal with the high levels of uncertainty that permeate climate models [689]. 
Methodological advances have been made in recent years to help render ensemble 
climate projections probabilistic [90, 632] and then provide local projections of the 
evolution of extreme climate values with the help of specific statistical models [728]. 


Chapter 11 A) 
Bayesian Extreme Value Theory herst 


Nicolas Bousquet 


Abstract This chapter provides an introduction to Bayesian statistical theory and a 
broad review of its application to extreme values. The Bayesian methodology differs 
greatly from the traditional approaches considered in the other chapters. Indeed, it 
requires the construction of so-called prior measures for the parameters of extreme 
value models and defines estimation through the minimization of a cost function 
adapted to the event. While Bayesian calculations (as those based on Monte Carlo 
Markov Chains) remain superficially discussed, this chapter focuses on modeling 
features, which allow expert opinion and historical knowledge to be integrated into the 
estimation of quantities of interest. In this respect, the Bayesian approach constitutes 
a methodology of increasing use, allowing the mixed treatment of aleatoric and 
epidemic uncertainties and adapted to the specific needs of engineers. 


11.1 Methodological Principles 


The great majority of the methods and tools presented in the earlier chapters treat 
the problems of inference, prediction, and extrapolation in the classical statistical 
framework, which supposes that the parameters y of extreme value models Fy 
take unknown but unique fixed values. This is a natural consequence of a strict 
application of the Fisher—Tippett [283] and Pickands [615] theorems, the supporting 
pillars of extreme value theory. This acknowledged, it is necessary to estimate these 
unknown parameters using available observations Xn = (x1,...,x,), which leads to 
the difficulty mentioned previously in Sect. 4.2.2: rules are required to choose the 
best estimator n. The quality of a given estimator Vn corresponds in part to the 
level of certainty that the statistical confidence region it defines contains the true 
parameter. 

The classical statistical framework is not the only one available to us, however. 
The Bayesian framework—just as important—can help us overcome certain limits 
of the classical setting, without disrupting the applicability of standard theorems. 
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This chapter reviews the advantages and disadvantages of using a Bayesian 
approach for handling extreme value statistics models. Examples will focus on one- 
dimensional and stationary extreme value problems, though there are increasing 
numbers of applications and improvements to such methods in the literature for the 
spatial and temporal multivariate cases. For a more in-depth look at important con- 
cepts, methods, and results in Bayesian statistics, we suggest the article [715] as an 
excellent starting point. 


11.1.1 Credibility Rather Than Confidence 


The Bayesian paradigm criticizes the notion of classical statistical confidence which 
is, in reality, not so efficient at decision-making from available observed data, since 
it is based on infinitely repeated virtual observations, dependent on the estimated 
model. Arguments leading to the construction of confidence regions are indeed often 
based on the convergence in distribution of estimators Wn (cf. Sect. 4.2.1), which 
is by essence asymptotic (i.e., when n — oo). This is the case for estimators for 
extreme value statistical models invoked earlier. However, the only information we 
have—without modeling hypotheses—is the data x, itself, and the data size n may 
not be large enough to ensure that we are close enough to the asymptotic setting. 

Bayesian theory replaces this notion with the idea of credibility. Here, condi- 
tional on observed data, an estimated credibility region contains y with a certain 
probability. 

Conditioning y on the observations implies that it is no longer considered as 
a vector of fixed and unknown parameters, but rather as a random latent variable, 
whose distribution is denoted generically by IT (and its density x). The existence 
of a probability measure on w is assured by the De Finetti representation theorem 
and extensions thereof [577] when observations are considered to be exchangeable 
(and thus potentially dependent) and no longer iid. The Bayesian framework thus 
encompasses the classical framework. The book [626] (Chap. I.5) provides numerous 
details concerning the existence of IT. 

The inference procedure consists of performing an update of the distribution TI, 
conditional on the observations Xn, using Bayes’ theorem [651]: the prior density 
z(y) is modified adding information coming from the statistical likelihood €(x,|W) 
into a posterior density 7 (y|Xn), written as 


(Ww) f el) 


EC) = 5 FG ay 


(11.1) 


Hence, while classical statistics calculates an estimator Wn (often maximizing the 
likelihood f (Xn|w)) which has a known asymptotic distribution, Bayesian statistics 
postulate that the true nature of y is better described by another statistical distribu- 
tion defined for n fixed, and described by Formula (11.1). Strong connections exist 
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Fig. 11.1 Illustration using 
density functions of the 
improved information on y -+++ prior distribution 


from the prior distribution =- normalized likelihood 
— posterior distribution 


and the data likelihood (seen 
as a function of y and 
renormalized here) 


density 


between the two approaches. In particular, when n — oo and under mild conditions, 
the respective distributions of Wn and z(w|x,) become similar. These connections 
are described in greater detail in [651]. 

Figure 11.1 shows the classical form of the statistical likelihood f (xp|w) seen as 
a function of y, as compared with the densities z(y) and 2(w|xy). The pooling of 
information on w leads logically to a posterior distribution with a sharper—and thus 
more informative—peak than the prior distribution.! 


Nature of the prior distribution. What does the prior distribution x (y) express? 
It corresponds to the knowledge we may have about the possible values of y, inde- 
pendent of the observations x,. Encoded in probabilistic form, this type of infor- 
mation is therefore likely to enable the addition of real and serious knowledge of 
the phenomenon expressed in other ways through recorded observations: expertise, 
predictions made by physical models, etc. It should be noted that prior information 
on y always exists; indeed, w is a choice of parametrization of the selected extreme 
value statistics model (MAXB or POT), and the structure of this model is known. 
This gives rise to constraints on the correlation structure of x(W) (cf. Sect. 11.2). 
We recommend, to readers interested further in possible interpretations of x (y), the 
book [591] and article [598]. 


' Except for cases where the two sources of information disagree; see [259]. 
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11.1.2 Dealing with Parametric Uncertainty 


The second criticism that can be made of the classical approach is its inability to take 
into account all parametric uncertainties. If y is estimated by Vn , whose distribution is 
denoted by G,,, how should we predict a new observation x„+1 of the extreme-valued 
natural hazard? This should in theory be produced using the conditional distribution 


Xn4+1 ~ f Xni, Xn). (11.2) 


1. By supposing that the next random event x,,, is independent of the previous 
ones Xn = (x1,..., Xn), the term x,4; can simply be estimated by 


approx. 


mn ~ Flt), 


but it remains necessary to include the uncertainty in Wns represented by G,, to 
lower the approximation error. However, G, also needs to be estimated, gener- 
ally using Vn itself (e.g., with the help of the delta method, cf. Theorem 4.4). 
It is thus necessary to set a practical limit which allows us to work with a G, 
with “sufficient” uncertainty to make reasonable predictions of x„+1. This prac- 
tical limit is linked to the precision of the estimation of Ga, which depends on 
asymptotic theoretical results that are often unverifiable in practice. Here again, 
asymptotism is a limiting factor, all the more so in a context where the number 
of real data is often low. 

2. If we suppose that the next random event x„+1ı is dependent on certain previous 
ones in x1, . . . , Xn, the distribution (11.2) has to be estimated by 


f On41 tn, Xn). 


The difficulties presented in the first point above still remain, and moreover, 
there is redundant information, arriving both from the data x, and Wn. This 
redundancy artificially inflates the available information and threatens to lead to 
an underestimation of the true uncertainty in the value of x,+1, making it even 
more important to resolve the earlier problems. 


In the Bayesian framework, the expression (11.2) can be rewritten uniquely, with- 
out theoretical approximations or untoward redundancy, as follows: 


forse = Í Tena (11.3) 


and in the case where conditional on y, the future observation x„+1 is independent 
of Xn: 
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faa) = Í ES CL (11.4) 


It turns out that Bayesian predictions are generally better than classical ones, in 
particular when looking at extreme quantiles [708]. According to Coles [173] 
(Chap.9), Bayesian techniques offer an alternative that is often preferable. He also 
notes that as the principal use of extreme value distributions is for prognosis and 
anticipation of future events, the most natural framework to estimate them should 
be predictive [173, 177]. The framework needs to transfer as completely as possible 
uncertainty in model parameter estimation into the functions of interest (such as 
return periods). 


Example 11.31 (Forecasting floods) 


Numerous authors have warned about the unfortunate consequences of poor (or 
non-existent) taking into account the uncertainty in extreme value hydraulics 
problems, whether in terms of extrapolation and forecasting [454, 534, 763] 
or in fitting the Manning-Strickler friction coefficient [223] for strong flow 
situations (see also Remark 10.14). 


Remark 11.15 Standard Monte Carlo techniques [652], requiring simulation of the 
posterior distribution 2(w|x,), can be used to calculate the integrals in Formulas 
(11.3) and (11.4). 


11.1.3 Model Selection 


Several other criticisms of the classical frequentist framework are examined in detail 
in the reference work [651]. In particular, classical statistical tests do not allow to 
compare symmetrically two hypotheses: rejecting one hypothesis is not similar to 
give credibility to the other one. (cf. Sect. 4.1.3). Besides, the arbitrary choice of 
extreme value thresholds in classical tests is a clear weak-point [258]. The Bayesian 
framework offers a clear alternative, placing several prior models on an equal footing 
and then selecting the most likely one conditional on the data x, = (x1,...,x,). Let 
us consider the case where the classical statistical test involves testing two Bayesian 
models 


Mi : Xii ~ fi(X|Wi) and pi ~ mipi), fori € {1,2}. 


Conditional on the data x,, the Bayes factor 
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Panl) — Sy, fiGnlvian) dya 
Panl) fy, nl a) (ho) dr’ 


B12(Xn) = 


(11.5) 


or ratio of integrated likelihoods, gives the ratio of the probability that the data 
matches with model .#1 over the probability it matches with model 2. This model 
selection problem can be more broadly defined as part of a decision-making frame- 
work (see Sect. 11.1.4) which provides a scale for the ratio’s value (after what value 
of Bız can we consider that model 1 is better than model 2?). To this end, we also 
include a penalty term (12 related to the fact that one of the competing models may 
be more flexible (or even over-parametrized) than the other, which could lead to a 
selection bias. Let 


_ Pat) 


p12 = Py Mo) M) 


where Py is a prior probability measure over the discrete and bounded Bayesian 
model space .4;. P(.@) and P(.@) represent, in practical terms, the prior proba- 
bilities that the Bayesian models .# and .@ correctly model the process producing 
the data X, before observing any. This probability measure is difficult to handle 
(except for cases where the models are supposed equitable a priori) and the ratio p12 
helps to more easily make a choice. In this case, the posterior probability that the 
model .#\ could have produced the data x, is defined by Bayes’ rule: 


Pn) Pa (M) 
P (Xn) Pa (M) + POM) PaM) 
{1+ 1/(p12Bi2(%n))} , 
= {1+ p21 Ba (xn)} |, 


P 4 (AH \Xn) 


and the “averaged” predictive posterior distribution—which helps us better simulate 
new data—is given by 


Pln) = POA) Pa Alo fxn) + P|) Pa Malo fxn). 


These model selection tools, which strongly differentiate Bayesian from classical 
statistics, can notably thus help us to refine our knowledge of the behavior of the 
parameter £, which is very important when characterizing the nature of the differ- 
ences between centennial, millennial, and decamillennial orders of magnitude (see 
Example 11.32). Also, further pleasing properties of the Bayes factor include that 
it does not need to be weighted as a function of a difference in dimension between 
competing models,” and is sure to select—in a finite list of models—as n — oo, the 
true model if it exists, or the best one according to an information-theoretic loss [89]. 
We recommend the articles [34, 384, 425, 776] for a more detailed introduction 
to understanding and calculating with Bayesian model selection tools. A readable 


2 As do, for example, classical asymptotic criteria like AIC [25] and BIC [693]. 
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comparison of the Bayesian and frequentist approaches for hypothesis testing is pro- 
vided in [715, 763] performs one such comparison for extreme and non-extreme 
value models in natural hazard management. 


Remark 11.16 We strongly discourage using standard Monte Carlo techniques like 


Mı 
| Mi D fi&alyni) 
Bi2(Xn) = m 


> 


M 
M;' à Pal Vi) 


where (M1, M) > 1, Wij o (yr), and Wo; es TT2(yn) to estimate a Bayes factor 
(11.5). Doing so leads to high instability in calculations [140]. More robust estima- 
tion techniques are based on posterior simulation of y and y2 [778]. Note that a 
promising model selection methodology based on mixtures, making it possible to 
dispense with the sometimes tricky estimation of the Bayes factor (notably when the 
prior measures are improper, cf. Sect. 11.2.1), was recently proposed by [422]. 


Example 11.32 (Torrential rain) 


A model selection exercise for choosing between Gumbel, Fréchet, and 
Weibull distributions was performed by [590] in the POT setting on snow- 
fall data and by [118] in the MAXB setting on the rainfall data in Example 
11.34. In the latter, an alternative method for calculating the Bayes factor, 
inspired by [422], is used. Lastly, in a regional study by [634], a Bayesian 
selection method for the parameter £ was proposed. 


11.1.4 Decision-Making for Risk Mitigation 


More generally, the Bayesian framework is well-adapted at formalizing and mak- 
ing mitigation decisions under uncertainty [337, 452, 708]. Hypothesis testing and 
model selection can also be defined under this framework [651]. Decision-making 
is conditional on the choice of: 


1. A function of interest h(y). In the extreme hazards context, this may typically 
be 


e the identity h(ÿ) = w, if we want to estimate the parameter vector y; 
e areturn level, calculated in the MAXB or POT setting; 
e the probability 1 — F(xo|y) that a future hazard will exceed a fixed level xo. 
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As w is treated as random in the Bayesian setting, h(y) is random too. We 
therefore denote by 2 (h(y)|Xn) its posterior density. 


. A cost function L(z, h(w)) which measures the cost or error resulting from 


the pointwise choice of z for estimating the random variable h(w); indeed, a 
Bayesian-type decision means choosing a unique representative value which 
minimizes on average this cost function over the uncertainty carried by h(y), 
expressed by T(Ÿ|xn): 


ACh) = arg min | L(z, A(Y)) TAH) |Xn) dh). (11.6) 
zEh(W) Jy 


The choice of a cost function integrated over the posterior distribution corre- 
sponds to the choice of an expected gain. Generally speaking, the cost func- 
tion depends on the study context as it possesses an economic interpretation 
[591]. Below are two classical examples of cost functions, which may be seen as 
first-order expansions of more complicated cost functions, that remain usually 
unknown to the statistician. 


The quadratic cost function L"*4 defined by 


LC, h(ÿ)) = (z — AY’, (11.7) 


applied to a function of interest h(y), provides as its “natural” estimator the 
posterior expectation (or mean) 


AG) = E[h(h)|xa] = f AQ) x (AGH) Xn) dy. 


This symmetric and convex cost function penalizes—strongly and similarly— 
large discrepancies between the pointwise estimator ACh) and the random vari- 
able h(y). It is therefore useful for estimating position and scale parameters in 
extreme value models. 


The piecewise linear cost function L° is defined as follows for (11.6): 


L°(z, hY) = Cilz — hey + Calz — AW) esac): 


It involves asymmetric costs: C1(z — h(yw)) of overestimating h(y) if it is less 
than z, and C2(h(y) — z) of underestimating h(y) if it is greater than z. The 
estimator ad) that comes out of this is the quantile of order a = C2/(C; + C2) 
of the posterior distribution of h(w) [591]: 


RG) 


a =P (nw) = ich) = J m (h(Y)Ixn) dh(w). 


—00 
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Clearly, this type of convex cost function is good for one-dimensional func- 
tions of interest, like return levels, for which overestimation corresponds to 
a conservative approach in terms of risk. Indeed, underestimating the return 
level is intuitively more costly than overestimating it (C2 > C1). The cost 
of overestimation is the cost of needlessly strong mitigation, while that of 
underestimation is essentially the damage caused by the natural hazard. 


An engineer can then easily communicate the risk level (or corresponding costs 
ratio) associated with their choice of the estimator; the posterior probability of 
having poorly evaluated the return level in the non-conservative direction can be 
estimated by C1/(C1 + C2) = 1-a. 


Example 11.33 (Flood protection) 


In [591, 592] can be found more detailed examples of the development of cost 
functions well-adapted to dike dimensioning for mitigating the consequences 
of flooding. The link between decision-making support in dimensioning and 
the selection of an extreme value statistical model,is also described in [452], 
which deals with cases of accidental chemical release, seen as random events. 


It is common, if no cost function stands out, to examine the stability of a Bayesian 
estimator by testing a variety of them. A practical way to do so is to use, in addition 
to the functions mentioned above, the linex family (linear exponential). This allows 
us to pool certain good properties of the previously mentioned functions, which is 
particularly of interest in finance when estimating quantiles (or value-at-risk) [805]. 
This family is defined by 


Le(Z, h(y)) = exp {cz — hA(w))} — cR- hy) — 1 


for c € IR. When c > 0, the function is asymmetric around 0; overestimation is 
penalized more than underestimation. When |z — h(ÿ)| — 00, it increases almost 
exponentially when z > h(y) and almost linearly when z < h(w). When c < 0, the 
two speeds are swapped. Lastly, when |z — h(y)| is close to 0, the cost function is 
close to c(z — h(y))?/2. 


Remark 11.17 Note that when using a sample h (Y1), ..., A(Ym), where the y; are 
drawn from the distribution 7 (W |Xn), the right-hand side expression in Eq. (11.6) 
can be easily approximated using standard Monte Carlo sampling if M is large: 


1 M 
Lie RDF) GA S D LG, hi). 


i=1 
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11.1.5 A Practical Review of the Bayesian Framework 


Why not work exclusively in the Bayesian setting then? It gives us more profound and 
systematic insights into managing uncertainty, while also incorporating knowledge 
other than observed data when doing statistical modeling. In theory, the methods 
and tools mentioned in earlier chapters can be carried over into this framework and 
provide equivalent if not better results, as explained by numerous authors (see for 
example [293] for forecasting extreme wind gusts). In the usual extreme value study 
context, the capacity of the Bayesian approach to take information in addition to 
observed data into account is welcome. In this sense, the usefulness of the Bayesian 
setting for industrials confronted with extreme natural hazards and regulation author- 
ities was highlighted as early as 1993 by Lambert and his coauthors [452]. 

Nevertheless, using Bayesian statistics in an industrial setting is not entirely 
straightforward, for the following reasons: 


1. construction/choice of a prior distribution (7) is not necessarily obvious and 
can lead to much criticism [715]; this can be seen as introducing unwanted 
subjectivity to the problem; 

2. calculating an estimator Vn is replaced by estimating a posterior distribution 
T(Ÿ|xXn), which can require specific, time-intensive algorithms for which con- 
vergence is uneasy to prove in practice; 

3. a gain in terms of precision in the estimation of a quantity of interest (like a 
return level, for example) cannot be systematically ensured; 

4. engineers or researchers in charge of a study do not have all of the elements 
required to run a full Bayesian analysis. For instance, the cost function L, indis- 
pensable for determining an estimator of quantities of interest, is not available to 
them (only to decision-makers). Furthermore, there are no guarantees that Equa- 
tion (11.6), based on the expected gain, is adequate for dealing with problems 
involving extreme hazard mitigation (see Sect. 11.3.2.1). 


Such arguments can be partially countered. First of all, it is possible to make 
an objective choice of prior distribution in numerous situations (see Sect. 11.2), 
even if the search for a perfect statistic free of all subjectivity seems hopeless (if 
not counterproductive) according to philosophers of science [225]. In the above- 
mentioned situations, the algorithms required to estimate the posterior distribution 
are well understood, and methods being developed for more complex cases continue 
to benefit from the growing power of computers (see Sect. 11.4). 

The search for an improvement in estimation is a problem related to the com- 
parison between a simulated model (for which the parameters are known and fixed) 
and an estimated model; de facto, such a comparison is biased because the compar- 
ison tools presuppose that the classical model is the norm. A fair comparison would 
involve comparing the location of actual observations within statistical confidence 
or credibility regions. Thereupon, Bayesian statistics prove to be highly competitive 
in extreme value modeling [175-177]. 

Lastly, the search for cost functions can be simplified: either results can be pro- 
vided to decision-makers in the form of the posterior distribution itself—leaving 
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them to make use of it—or estimates of y and functions thereof can be provided by 
the engineer through a choice of cost functions which approximate (at first-orders) 
more complex and/or unknown ones [651], such as those presented in Sect. 11.1.4. 

However, the use of Bayesian methods in extreme value statistics remains essen- 
tially, for the time being, in the academic domain. The main reasons for reluctance 
in using them further are probably due to the following: 


(a) The difficulty in defining an prior modeling methodology that is as universal and 
general as that which consists of maximizing a likelihood. Practical strategies 
are often perceived as being case-specific; 

(b) A certain degree of statistical conservatism from applied research engineers, 
a behavior found in numerous applied settings [471] and the vast majority of 
public policy-making worldwide [276]. 


As stated earlier, this chapter aims above all to provide an up-to-date view of the 
field to engineers and researchers who are perhaps interested in refining initial studies 
performed with the help of classical statistics, as well as looking at how to adapt this 
framework to real-world case studies. 


11.2 Prior Modeling 


The choice of a measure x (y) is no simple task. While there is still no comprehensive 
prior modeling methodology with as much universality as extreme value theory itself, 
solid principles—summarized for example in [82, 789]—can be implemented and 
used by practitioners. 


11.2.1 Choosing a Reference Measure 


If no prior information other than the parametric model X|w ~ f(x|w) is available, 
the chosen measure x (y) has to be considered a benchmark prior, usually called an 
“noninformative prior”? which acts to regularize the likelihood [79]. The choice and 
construction of this measure obey certain formal rules based for the most part on the 
invariance of prior information and a weak sensitivity of the posterior distribution 
z (WY|Xn) to the choice of x(ÿ). The main formal rules are given and discussed in 
[426]. Several common measures are described below. 


e The Jeffreys prior [412] is defined by 


n(Y) x ,/det Iy, 


3 Even though this term is overused, precisely due to our knowledge of X|y ~ f(x|y) [408]. 
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where Jy is the Fisher information associated with the sampling model f(x|Wÿ). 
This measure has the property of being invariant with respect to any bijective 
reparametrization w — h(y), which is desirable when proposing a formal defi- 
nition of “noninformative”. 


Remark 11.18 (Uniform measure (Laplace’s prior)) 

The above property is not valid for the classical uniform measure derived from 
Laplace’s principle of indifference (see for example [230]). It can therefore not be 
considered as noninformative, and we should beware of its use in practical situations 
as it can lead to paradoxes [426]. 


e The Berger-Bernardo reference prior x PE (yr) [80] is constructed in such a way 
that its posterior distribution x?” (y|x) is maximally deformed or enriched by 
each dataset x € Xg produced by the Bayesian model 


X~ f(x = f Fardi 
y 


The measure x PP (yr) can thus be interpreted as the one that has the least influence 
on the posterior distribution, on average, over the set of possible datasets x. The 
idea of deformation of a probability distribution 7 (W |Xn) with respect to x (y) is 
defined here with the help of the Kullback—Leibler divergence* between 7 (ÿ|xn) 
and x(Ÿ): 


m (Wx) 
z(y) 


KL») = | rh log dy. 
y 


The mean of K L(x, x) over f(x) defines the mutual information between y and 
X [185]: 


In(w, X) =f K L(x, x) f (x) dx. 
Xg 


Thus, 


BB 
T = arg max I,,(w, X), 
(V) = arg max In (4, X) 


where the size of the set of random variables X tends to infinity. Invariance of 
mutual information guarantees that the reparametrization invariance property holds 
for x PB (y). 


Remark 11.19 In multidimensional settings, when constructing x 24 (1), it is pos- 
sible to distinguish between two types of parameter for each component of w. 


4 Or the negative (relative) entropy between 7r (y|x) and x (y). 
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Parameters of interest are typically those that play a part in prediction. Nuisance 
parameters, on the other hand, are generally those linked to the observation process 
(typically bias or variance in observations) whose influence we would like to curb 
when making predictions [60]. We do not go down this path in the rest of the chapter; 
further details can be found in [80]. 


e The Maximal Data Information Prior (MDIP) [795, 797] is constructed so as to 
maximize the average information G (7 ) contained in the density f(x|y) relative 
to that found in the prior measure. If we denote by H (y) = —Ex[log f (X|y)] 
the entropy* of f(x|w), and 


| y) 
Hy = -E | iog =| 


the entropy of x (y) relative to a measure 7mo (W) (chosen by default), then G (7) = 
Ey[H; — H(yw)] and the MDIP is defined by 


MDI 
= G(x), 
x (W) ae mai (x) 


r(Y) 
moly) 


= arg max Iy m(W)Ex[log f (X1p)] ay- f x (ÿ) log du. 
m(w)=0 (Jw y 


which gives 


m0 Wr) œ moly) exp (Exllog f (X|¥))). 


Though x #21 (y) is no longer systematically reparametrization invariant, it can 
be made so for a finite number of useful reparametrizations of y [426]. This class 
of prior measures often performs well with respect to other approaches. It allows 
us to more frequently obtain informative prior measures when extra constraints 
are added [798], by virtue of the principle of maximum entropy (see 11.2.2). 


Equalities like z(y) = m? (y) in unidimensional situations, the study of 
desirable theoretical behavior [164], and the quality of consistency properties of 
Bayesian estimators [426] can be used to support the choice or rejection of a 
prior measure. Other properties can be used to construct a hierarchy, such as for 
instance posterior matching frequentist coverage [718], which measures the close- 
ness between maximum likelihood and noninformative Bayesian results. These prior 
measures are generally improper, i.e., non-integrable: 


[ro dv = se. 
y 
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and are therefore not probability measures. Recent work has allowed us to see them 
as limiting cases of proper (integrable) probability measures for a certain topology 
[95]. The lack of prior integrability is not prohibitive as long as the density of the 
posterior distribution x (W |Xn) remains integrable, as it is this which allows estima- 
tion and statistical prediction to be possible. 


As for the use of these priors in the extreme value distribution framework, 
Tables 11.1 and 11.2 summarize known noninformative priors for GEV and GPD 
distributions. While usually the Berger-Bernardo reference prior works better than 
the Jeffreys one in multidimensional settings [426], the former remains difficult to 
construct [80]. Indeed, it has not yet been published for the GEV case. The Berger- 
Bernardo reference prior is appreciated for its good posterior frequentist matching 
coverage and is known explicitely” for Fréchet sub-model [21]: 


1 
x’? (u, 0,Ẹ) x —, (11.8) 
og 
and Weibull sub-model too [725]: 
BB 1 
m” (u, 0, €) X — (11.9) 
oE 


As for the Gumbel distribution, only the Jeffreys prior has been obtained, by [752]: 
x! (u,o) x E 
o 


In practice, it is recommended—when several measures are in competition for a 
specific applied setting with a fixed number of data n—to run studies simulating 
the posterior distribution in order to select the prior reference measure, looking in 
particular at the proximity of a classical Bayesian estimator (typically the posterior 
expectation) to the parameter values chosen in the simulation, as well as the frequen- 
tist coverage matching property. 


An alternative approach to using improper measures is to use prior probability 
measures that are “close” to them in a certain sense (in general, by letting hyperpa- 
rameters of these distributions tend toward specific values). For instance, when the 
domain of attraction of an extreme value distribution is Weibull [790], the use of 
distributions constructed in this way shows that biases or inconsistencies linked to 
the use of multidimensional Jeffreys posterior estimators can disappear. 


5 Results (11.8) and (11.9) can be obtained by reparametrizing the choices made in [21, 725]. 
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Table 11.1 Characteristics of prior reference measures and their posterior distributions for n data 
points under the MAXB (GEV distribution) approach: y = (£, u, o). Here, y refers to Euler’s 


constant (y ~ 1.57722) 


Prior measure 


Posterior measure 


Reference(s) 


Jeffreys 
MDIP (truncated) 


Log-uniform 
Northrop & Attalides 


defined if £ > —1/2 
7 MDI ww) x 

exp {—y (1 + &)}/o 
with£ > —1 

TW) x 15>-07/0 
n(Y) x x(E)/o 


with proper 7 (Ẹ) 


improper Yn > 0 


proper Yn > 2 


proper ifn > 4 
proper Vn > 2 


[443, 563] 
[66, 563] 


[563] 
[563] 


Table 11.2 Characteristics of prior reference measures and their posterior distributions for n data 
points under the POT (generalized Pareto distribution) approach: y is restricted to the parameters 
(£, o) (the threshold u is considered known or estimated) 


Prior measure 


Posterior measure 


Reference(s) 


Jeffreys 
MDIP (truncated) 
Pickands 


(log-uniform) 
Northrop & Attalides 


z(y) x 
o(1+£)V1+2Ẹ 

q MDI (w) x 
exp(—1 — &)/o 


withé > —1 


TW) x 15-0}/0 


xp) x x(E)/o 


with proper 7 (Ẹ) 


Proper 


Proper Yn > 0 


Proper ifn > 3 


Proper Yn > 0 


[257] 


[66, 563] 


[563, 617] 


[563] 


Example 11.34 (Torrential rain in Corsica) 


We consider the set of 29 yearly precipitation maxima (Fig. 11.2) measured 
or reconstructed in the village of Penta-di-Casinca (Corsica), which were ana- 
lyzed by [674]. In the MAXB setting, under the stationarity hypothesis, the 
plots in Fig. 11.3 show a comparison, for each of the GEV parameters, between 
the distribution of the Maximum Likelihood Estimator (MLE) and the poste- 
rior distributions of the MDIP and log-uniform priors (Table 11.1). We see 
that the domain values are similar for the three parameters, though the MLE 
distribution is generally more informative than the two posterior distributions, 
which incidentally are asymmetric for ø. 
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Fig. 11.2 Maximum annual precipitation per day (in mm) in Penta-di-Casinca, Corsica. Source 
http://penta.meteomac.com 


11.2.2 Selecting a Prior Algebraic Structure 


If supplementary information is available, one or several parametric choices must be 
made for x (y). One simple idea [481] for guiding the algebraic structure of x (y) 
is to consider that 


(a) prior information is expressed in a “hidden” way in the form of a virtual iid 
sample Xm; 

(b) truly available information provides us with (direct or indirect) knowledge of 
certain features of this sample; 

(c) the prior distribution 7 (W) should ideally be defined as the posterior distribution 
of y given m and an noninformative prior 2’: 


n(Y) = x NT (Y Km). (11.10) 


Besides guiding the choice of the algebraic form of the prior distribution, this ide- 
alized vision has several merits [789]. First, the correlations expressed in z(y) 
between the coordinates of y are coherent with the nature of supposed or observed 
data. Second, the number m of virtual data is an intuitive calibration tool which can 
be directly compared with the number n of real data xn. The combination of several 


6 Or rather with the effective level of information carried by the data xn, some of which may be 
only partially informative (e.g., censored); see Sect. 5.3.3 of [626, 651]. 
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prior distributions—a well-known situation to specialists [572]—is then equivalent 
to aggregating several virtual datasets and operates directly via Bayes’ rule if poten- 
tial correlations between information sources can be removed. Other benefits are 
highlighted in [480, 528, 549]. 

The following definition introduces a practical vocabulary element for handling 
parametric Bayesian laws: hyperparameters. 


Definition 11.26 (Hyperparameters) 

In the rest of the chapter, we will concentrate on the use of parametric prior distribu- 
tions and call hyperparameters the parameters in z(y) and x (ÿ|x,) to distinguish 
them from those in y which contains the extreme value distribution’s parameters. 


Constraining the form via meaningful hyperparameters—in terms of information 
quantity relative to that brought by the real data—is a necessity in the extreme value 
Bayesian framework; Coles [177] has indeed shown that a purely subjective prior 
distribution can uncontrollably take precedence over information contained in the 
data when estimating parameters and, in particular, over the fundamental parameter £ 
that controls the heaviness of the distribution’s tails. 

When the parametric model f(x|y) belongs to the natural exponential family’ 
[651], applying Formula (11.10) allows us to construct the major part of the conjugate 
prior family of distributions, the form of which being explicit and remaining stable 
when conditioning on data (for more details on the extent and properties of this 
family, see [279]). For such models, the virtual size m and the mean of virtual data 
define natural hyperparameters [789]. 

However, extreme value statistical distributions do not accept conjugate prior 
measures over the totality of their parameter vector, except for the Gumbel distribu- 
tion, so only semi-conjugate approaches can be proposed. Currently, we distinguish 
between three principles of prior modeling. 


(a) Chose the form of the prior based on an approximation of the posterior distribu- 
tion (11.10) or based on semi-conjugation properties. Examples can be found in 
Sect. 11.2.3 for the POT and MAXB approaches. 


(b) Justify another formal rule allowing us to select the algebraic form from spec- 
ifications other than a virtual sample. In this vein, an important approach is 
the principle of maximum entropy [411]. This provides a measure x*(#) by 
maximizing its entropy relative to an noninformative measure 7: 


m*(wW) = arg max l-J log = ay) 
m(W)=0 Y Toly) 


1 Which includes the normal, gamma, multinomial, etc. 
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Fig. 11.3 Posterior densities 
of noninformative prior 
distributions on (u, o, £) and 
the density of the maximum 
likelihood estimator obtained 
using the delta method, for a 
GEV model of maximal 
annual daily rainfall in 
Corsica (Casinca). The 
Bayesian calculations were 
run using a Metropolis- 
Hastings-within-Gibbs 
algorithm with 6 parallel 
chains over 50,000 iterations 
(cf. Sect. 11.4) 
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under linear constraints in x: Vi € {1,..., g}, de h,(by)r(w) dy = ci. Solving 
this type of functional optimization problem using Lagrange multipliers [626] 
gives the explicit result: 


q 
x (#7) & moh) exp D ano) (11) 


i=1 


where the Lagrange multipliers À; € JR must be estimated using the linear con- 
straints. This strategy suffers nevertheless from several weak points, including: 


the reference measure 7, is typically chosen to be uniform for practical reasons, 
whereas it should in reality have good properties like reparametrization invariance 
(7o thus should intrinsically depend on the algebraic structure of f(x|w)); 

the normalization constraint Sa x(W) dw < œœ, which encodes the fact that we 
want a proper prior, is not automatically satisfied. However, bringing together the 
maximum entropy approach and MDIP construction rules nevertheless can allow 
to improve the chance of obtaining a integrable w*(y) [114]; 

the algebraic form of the prior distribution can fluctuate wildly as a function of the 
linear constraints, and the Lagrange multipliers (which define a family of hyper- 
parameters) are not easily interpretable. 


(c) Arbitrarily choose a common distributional form and check that its features (the 


tails of the distribution in particular) have little influence on the posterior results. 
This general approach has been pushed by Coles [173, 175, 177, 179] who 
often uses independent truncated normal (or gamma) prior distributions on the 
parameters of GEV distributions. The reasoning here is practical: this method 
simplifies the computations necessary to obtain the posterior distribution of these 
parameters (see Sect. 11.4). 


In particular, in [173] Coles shows the feasibility of the Bayesian approach on 
yearly seal level maxima data at Port Pirie (Australia) by putting independent nor- 
mal priors with high variance on GEV parameters. This approach—also chosen 
by [706] (Chap. 3) for modeling rainfall—corresponds to a crude approximation 
of a noninformative approach. The choice of priors remains tricky in practice 
since it can lead to physically implausible models (for instance in the case of 
regional analyses [634]). 


Remark 11.20 Though this chapter looks at extreme value distributions coming 
from the asymptotic theorems presented in Chap. 6, certain authors equally employ 
Bayesian approaches for other distributions, justifying this with the use of statistical 
tests. For instance, [491, 763] propose using generalized exponential distributions 


8 Indeed, the linear constraint here is the identity (h;(y) = 1), which plays no role in Formula 
(11.11) as it is folded into the underlying proportionality constant. 
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[366] for modeling extreme river flows and high precipitation, respectively. These 
are defined by the cumulative distribution function 


P(X < x|W) = {1—exp(—Ax)}* with y = (œ, à) € RF. 


They then perform calculations in a very similar way to Coles and his coauthors using 
weakly informative gamma priors. An essentially identical study has been performed 
in [490] with the Pareto distribution for modeling temperature maxima. 


11.2.3 Conjugate and Semi-conjugate Prior Modeling 


Semi-conjugate priors (yf) are those whose posterior structure is partially sta- 
ble. A common situation is to have—for certain coordinates of the parameter ÿ— 
conditional conjugation on the other coordinates of y. Here, some hyperparameters 
in z(Ÿ) can still be viewed as the size and statistics of a virtual sample, which can 
help calibration. 


11.2.3.1 The POT Setting 


First approach. Recall that the conditional POT approach consists of estimating 
a model defined by the cumulative distribution function of the Generalized Pareto 
Distribution (GPD) 


VWs 
pees = fae oars for #0, 
1 — exp (=) for é = 0, 

using data xi,...,x, with values greater than u. The unconditional approach, 
examined by [589, 770], consists in setting the value of u and adopting a POT 
approach, which brings into the study the counting process of excesses 1,x,-,, over 
the sampling time period T. Under a classical independence hypothesis between 
exceedance events, this is a homogenous Poisson process with intensity À: 


QT)" 


n! 


P(Nr =n) = 


exp(—A), (11.12) 


where Np is the (random) number of times u is exceeded during the time period T. 
The complete likelihood of the data x,,..., x, is then 


P(X; = x1, ..., Xn = Xn, X1 [u,..., Xn > ulT) 
P(X, = x1, ..., Xn = Xul Xi 2 u,..., Xn Su) P(X, = u,..., Xn > 113) 
= fEl W)P(Nr =n), (11.14) 
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where f(xn|Ÿ) is the generalized Pareto likelihood, which can be written as [589] 


f Enl y) = p” exp {(p — B)5n (Xn, B)} , 


using the reparametrization B = —&/o and p = 1/o, with 
1 n 
Sn (Xn, B) = g 2108 (1 — BGx =w). (11.15) 
k=1 


Remark 11.21 When £ (or £) equals 0, the statistic (11.15) simplifies by continuity 
to Sn (Xn, 0) = — Pee: — u). 


The parameter vector w to be estimated then becomes {u, p, B} (as u is constant). 
A semi-conjugate prior modeling from [589] is given in Proposition 11.7, which can 
be partly interpreted as a posterior distribution given a virtual sample. Note that [589] 
proposes numerous calibration choices for the hyperparameters without necessarily 
linking them with this interpretation. For a different parametrization, [770] provides 
a similar methodology. 


Proposition 11.7 (GPD prior [589]) Let x (x, p, B) be a prior distribution defined 
by 


u~ Gi(m,t), 
plB ~ F (m, s(B)), 
B ~ n(), 


where x (B) is any density on B € IR. Then, n(n, p, B) is conjugate for the likelihood 
(11.14) conditionally on B and can be interpreted as the posterior distribution of a 
sample of m data values Xm greater than u observed over time period t and such that 


S(B) = —Sm(%m, P). 


Indeed, given {xh, T}, the conditional posterior distributions are given by 


UlXn, T ~G(m+n,t+T), 
plê, Xn, T a G (m F n, s(B) F Sn (Xn, B)) ’ 

T(m +n) 1(B) exp {—Bsn (Xn, B)} 
T(BIXn, T) X ns 
Pim) [l—s,(&n, B)/s(B)] 


(11.16) 


A Metropolis-Hastings-within-Gibbs algorithm then allows us to sample from the 
posterior distribution (Sect. 11.4). 
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Second approach. Restricting ourselves to the common case £ > 0, an original semi- 
conjugate approach for the distribution of the data y; = x; — u (with u supposed 
known), which follows a GPD reparametrized by (a, 6) witha = 1/£ and B = 0 /é, 
was proposed by [220]. The density f(ylæ, B) of this distribution has the property 
of being able to be expressed as 


fla, p) -f z exp(—yz)g(z|æ, B) dz, (11.17) 


where g(z|œ, B) is the density of the gamma distribution & (œ, B) [639]. By introduc- 
ing auxiliary variables z; ~ Y(a, B), it is possible to define the statistical likelihood 
of the data y; with respect to these z;: 


f (Yml2m) X exp (- 3 an) 
i=l 


which corresponds to a likelihood for exponential distributions. The full Bayesian 
model is then 


yi ~ Ezi), 

Zi ~ G(a, B), 

a ~ Gir (n/rA,m), 
Bla ~ G(ma+1,mn), 


(11.18) 


where r; (c, d) is the Gamcon II distribution [194] with density 


F(da + 1) 


—da 
ae (cd). 


m(al|c,d) x 


The data z; play the role of parameters (common in Bayesian modeling). The choice 
of z (œ, B) is such that if we have reconstructed data z1,...,2z», 


œ|Z1, ces Zm X Grr (n/A, M +m) A 
Bla, z1,...,Zm ia G ([m + mla + 1, [m+ mln’), 


with n’ = (mn + D, z;)/(m + m) and 


M 1/(m+m) 
N = dam (i a) ; 
i=l 


Hence, the prior information can be interpreted as coming from a virtual gamma 
distribution of size m, arithmetic mean 7, and geometric mean À. It is then easy to 
construct a Gibbs algorithm for the Bayesian calculation (Sect. 11.4) of the distribu- 
tion of (œ, B) given Xm. 
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11.2.3.2 The MAXB Case 


The GEV distribution (uni- or multivariate) does not allow for any conjugate priors. 
Semi-conjugate ones have been proposed by [251]. As an aside, it is possible to define 
practical prior models for each of the three extreme value distributions: Fréchet, 
Weibull, and Gumbel (cf. Sect. 6.2.1), which seems coherent with respect to the POT 
results. This is of particular advantage for the Gumbel with its cumulative distribution 
function: 


P(X <x|w)= exp [exp =) witho > 0,u€ R andx € R, 
o 


very often used in hydrology in a Bayesian framework, and for a long time, the most 
used distribution for modeling extreme rainfall [444] despite its drawbacks (cf. Sect. 
6.5). Indeed, it accepts the conjugate prior [159]: 


m(u,o) xo” p(n (= &n) - Dow {#2 su) | 


where the hyperparameters (m, Xin X1,.-., Xm), respectively, correspond to the size 
of the prior dataset, its mean, and the data itself. Given real data xy, the posterior 
distribution is written as 
MX +nXp 
(u = m+n ) 
o 


z(u, o |Xn) x o ™— exp | {m + n} 


oaa 


This prior modeling is thus relatively useable for dealing with historical data (see 
Sect. 11.3.1.1). When m is small, we could imagine using a relationship like (11.32) 
to look for the values x; [118]. Note that x (u, o) is proper only if m > 3. 

An alternative (semi-conjugate) approach, proposed by [515], uses the 
reparametrization y% = exp(u/o). Conditional on o, a gamma distribution is con- 
jugated. However, it remains necessary—in a completely Bayesian approach—to 
specify a prior ono. 

The Fréchet distribution F (y) accepts a semi-conjugate prior [118]. 
Reparametrizing y = (u, v, £) with v = o!/* > 0, the Fréchet cumulative distri- 
bution function is written as 


P(X < xlÿ) =exp{—v(x—yp) 5} withé > 0, u € IR and x > n. 
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Proposition 11.8 Fréchet prior [118] Let z(u, v, £) be the prior distribution defined 
by 


v|u, Ë ds G (m, Si (LL, §)) , 
Elu ~ FF (m,si(u)), 
lipere) 


z(u) x Ga = I Sms a 


where Xe, < Xe, and 


s1(, £) = M(Xe, = u) LE, 


Xe — H 
s2(u) = mlog (==) . 
Xe ZH 


Then, x (jt, v, €) is conditionally conjugate on (&, 1) for the Fréchet likelihood and 
interpretable as the reference posterior distribution of a sample Xm ofm Fréchet data 
points with shifted geometric mean 


m 


Xe, = M+ [IG — py”, 


i=l 


In effect, given Xn ~ -F (y), the conditional posterior distributions present a kind 
of balance between the real data’s and virtual data’s statistics: 


vM, E, Xn ~F (r +n, silm, E) +È (xi — me) , 
i=l 


m(ElU, Xn) X sh: {-1 (m log(xe, — u) + nlog(x — ))} | 


m n m+n 
( (e — WE + D — w) 
i=1 


jal 


(11.19) 
exp (-: |m log(xe, — u) +n log(x — »)]) 
GG — u)" 


x exp (~ ba — py + SG - or" |) 
i=1 


k=1 


z (ulv, §, Xn) X 


with 
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A similar result for the Weibull distribution W(w) can be easily deduced by 
symmetry from Proposition 11.8. Its cumulative distribution function is written, 
with the same reparametrization, as 


P(X < xlÿ) =exp{-v(u—x) 6} withv > 0,€ > 0,4 € Randx < u. 


Proposition 11.9 Weibull prior [118] Let z (n, v, £) be a prior distribution defined 
by 


vu, Ë ~ F (m, 53(u, &)), 
Elu ~ IG (m, s4(u)), 
Liuzxa) 


M T T 


where Xe, < Xe, and 


s3(4, £) = m(u = Xa) "E, 
Xe 


s4(u) = m log (=) : 
HU — Xe, 


Then, z(u, v, €) is conditionally conjugate on (&, 1) for the Weibull likelihood and 
is interpretable as the posterior distribution of a sample m of m Weibull data points 
whose shifted geometric mean is equal to x,,: 


m 
Xe = u> | [@-x)™. 


i=l 


Remark 11.22 The Weibull distribution used in extreme value statistics is a kind 
of “reverse” version of the one usually seen in survival analysis or when studying 
renewal processes [650] 


Te: 
PO <un B) =e9|-(*5") | with n > 0,8 > 0,u € IR and x > u. 
n 


Numerous prior Bayesian models have been suggested in these domains, providing 
alternatives to the semi-conjugate choice and inspiring the construction of Bayesian 
models for the Fréchet distribution. In particular, the works [112, 255, 423] suggest 
ways to include engineering expertise in modeling. Nevertheless, remember that 
questioning an expert in an industrial reliability setting is not the same thing as doing 
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so on the subject of extreme hazards. The former is probably less risk averse than 
the latter (see Sect. 11.3.2.1). 


11.2.4 Hierarchical Modeling 


Physical processes generating the occurrence and intensity of extreme hazards can 
often be partially understood. Physical models, implemented using numerical simu- 
lations (Sect. 10.3), are sometimes capable of reproducing certain features of events. 
For instance, the global climate models used by the IPCC? since 1988 are built using 
simulations of the evolution of major climate variables (temperature, rainfall, rela- 
tive humidity, etc.) over mesoscale (several hundred kilometers) spatial grids, as a 
function of measurable or predictable parameters (notably greenhouse gas concen- 
tration). As a function of the spatial and temporal precision, the uncertainty affecting 
these simulations can vary to a considerable extent. Nevertheless, the use of physical 
models allows us, a minima, to get a sense of the influence of correlations between 
hazards from the past and with locations near to a given site. This knowledge, if it 
exists, can add to statistical correlation analyses between sites. 

The interest in exhibiting a partial knowledge of the underlying physical processes, 
by means of contextual data and/or models, is that it makes it possible to place 
the extreme statistical model in a hierarchical modeling framework. The state of 
nature represented by y which generates the random variable X is dependent on 
(conditioned on) a hidden variable (called latent) denoted by ¢, on which knowledge 
is available. As this knowledge is by its very nature uncertain, ¢ can practically be 
considered as a random variable with density x(£). In the simple MAXB and POT 
settings, we can then define a hierarchical model: 


X~Gey(w) or Yep(W) (modeling the data) 
y~ alc) (modeling the underlying process) (11.20) 


& — me) (modeling process parameters) 


and the prior distribution z (W) is thus defined by f m(W|£)r(é) dë. Clearly, we can 
extend this schema further by imagining that, for instance, 


¢ ~ mk), 


k~ TK). 


Reasoning in this way often leads to more robust posterior results with respect 
to uncertainty in prior information [651]. A concrete example is where several prior 
information sources provide values ¢1, . . . , &,. A practical strategy is then to consider 
this as a sample which can be used to calibrate the distribution x (£). 


? Intergovernmental Panel on Climate Change. 
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Example 11.35 (Multiple information sources for the parameter &) 


The sign and value of £, whether in the MAXB or POT setting, greatly influence 
the nature of the extreme value distribution’s tail and estimation of return levels. 
Suppose for the sake of convenience that a priori, 


& at AM (mm0) 


and that we have several pairs {ue 1, of} nen | MES, of T resulting from 


the estimation of £ at sites similar to the one of interest. Writing ¢ = {Hg, oz}, 
we can run statistical tests on the sample of ¢; for testing hypotheses of the 
form 


he ~N (Ñs, Ops)» (11.21) 
of ~ FG (ag, be). (11.22) 


When different pieces of prior information on yw are correlated—for example, 
when similar sites are geographic neighbors of the site of interest—this correlation 
can be taken into account by supposing (for example) that the vector of ¢; was 
generated by a multivariate model in dimension 2q and no longer by (11.21-11.22). 
In this spirit, [30] provided an example of a hierarchical approach allowing the 
combination of several correlated information sources coming from experts (Sect. 
11.3.2) who influence each other. Also, note that the prior modeling (11.18) proposed 
in Sect. 11.2.3.1 for the POT framework, which uses auxiliary variables (or data) z;, 
is by nature hierarchical. 


11.2.4.1 Spatio-Temporal Covariates and Dependency 


In numerous situations, hierarchical modeling is the result of a link between the 
parameters of extreme value distributions and explanatory covariates: at a given 
site, the extreme hazard depends on factors Z = (Z1, ..., Z,) for which informa- 
tion is available. See [695] for a (frequentist) comparison of the most well-known 
extreme value spatial hierarchical models. Generally speaking, we can character- 
ize the parameter vector y (or a transformed version thereof) of an extreme value 
distribution in the following way: 


w = g(Bo, Bi,.--, Bg, Z1,.--, Za), (11.23) 
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where B = ($i): is a vector of coefficients to be estimated, which represents the 
influence of the variable Z; on y (Bo being an effect independent of the Z;), and g 
a usually monotonic function, linear or otherwise. In Formula (11.23), the values Z; 
are generally known (measured in situ, for example) or modeled using statistical dis- 
tributions. The latent vector ¢ = B and the stochastic setting (11.20) then simplifies 
to a deterministic one. 


Example 11.36 (Extreme rainfall) 


The distribution of yearly precipitation maxima (or exceedance values) at site 
i is assumed, in [332], to follow a GEV (or GPD) distribution whose location 
parameter ju; is defined by 


ui = B" Zim, (11.24) 


where Z; is a set of measurable covariates at site 7, including in particular 
the geographic coordinates (latitude, longitude). Without taking into account 
temporal variation in B (see Example 11.37 below), the prior distribution on 
B is a multivariate Gaussian one: 


B ~ A, (Bo: Ep) - (11.25) 


A priori, we know little about the distribution of the factors of influence £;. 
It therefore seems like a good idea to take a hierarchical approach and put 
weakly informative distributions on (Bo, Xg), making sure that the joint pos- 
terior distribution of Y; = (Mi, Oi, i) remains well-defined. Gelman [325] 
has proposed a family of distributions that adapts well to the specific choice 
of (Zig). 


Remark 11.23 Bayesian regionalization approaches (Sect. 11.3.1.2), which allow 
for the introduction of spatial correlation into extreme value distributions, can be 
interpreted as approaches where the explanatory covariates Z; are the values of y 
(or alternatively of the hazard X) itself in regions nearby the site of interest. 


Temporal dependency in extreme value models (Chap. 8) can also be inserted at 
the level of the latent variable ¢. For example, the vector &,, ¢+1,..., Ẹr allows us to 
model the impact of seasonality over an entire region. More precisely, according to 
[176], it can help model non-stationary effects via seasonality ones when the tempo- 
ral variation is expected but undetectable in small datasets. The dimension-reduction 
techniques used in Chap. 8 apply here; thus, ¢, generally reduces to an autoregressive 
process. 
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Example 11.37 (Extreme rainfall) 


The modeling in Example 11.36 (from [332]) can be expanded to take into 
account the temporal trend seen in u. In (11.24), the Z; have fixed values, 
independent of time, and an autoregressive process is placed on the components 
of B: 


Ber = GxBrr-1 +E fork € {1,...,q}, (11.26) 


withe, ~ V (0, op). The prior modeling (11.25) has therefore to be turned into 
a prior modeling of (@1,...,@y, Bo, op). Following the suggestion of [325], 
we place a weakly informative inverse gamma distribution J% (0.1, 100) on 
op. Though [332] proposed an ad hoc pointwise estimation approach for Bo, a 
hierarchical choice is made for each a, by introducing a new hyperparameter 

2 
og: 

aloe ~ MN (0, oz) ; 
où ~ .£G(0.1, 100). 


In doing so, however, the prior information on the a; is made quite vague, and 
the posterior distribution is nearly exclusively dependent on data information. 


Remark 11.24 The special case of Bayesian linear regression is the subject of a vast 
quantity of literature. In particular, the use of Zeller’s prior modeling (the so-called 
G—prior [796]) on the vector of coefficients B is generally recommended. For more 
information, see [651] (Sect. 4.3.3). 


11.2.4.2 Model Complexity 


The use of correlation models between y and other variables is fairly standard when 
dealing with extreme natural hazards problems because we look to take advantage of 
the spatial and temporal dependence of extreme values to increase the information 
in the dataset—or make up for its absence. Hierarchical prior modeling thus appears 
to be a useful tool for engineers, on the condition that the stack of hierarchical layers 
still allows the production of a posterior distribution that is more informative than 
the prior. Indeed, it might be tempting to build a complex model that tries to take 
into account spatial and/or temporal random effects, and then be confronted on the 
one hand by complicated posterior calculations (Sect. 11.4), and on the other by the 
problem of nonlearning or an improper posterior distribution. 

A nonlearning problem can appear when the posterior distributions correspond 
to the prior ones for some or all of the parameters. This means that the data is not 
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informative for these parameters. An improper posterior distribution is generally 
due to the stacking of improper (noninformative) priors on the parameters in the 
upper levels of the hierarchy (those furthest from the likelihood; for instance, Xp 
in Example 11.36). An added difficulty is that computational techniques used to 
produce posterior distributions (Sect. 11.4) can incorrectly give the impression that 
the distribution being looked for is well-defined. Warnings to this effect are described 
in [652] (Sect. 10.4.3). 

Therefore, it is important to algebraically confirm that the posterior distribution 
is indeed proper.!° Besides it is strongly advised to progressively increase the level 
of hierarchical complexity, for example by setting some parameters as constants 
when testing Bayesian calculation tools, rather than putting poorly informative prior 
measures on them. The software tool described in Sect. 11.4.4 is able to quickly and 
easily modulate the level of complexity. So as to limit such complexity, results on 
taking into account trends like (11.26) in hierarchical models suppose in general that 
only one parameter is concerned (usually the location parameter ju, as is tricky to 
estimate and often left constant). 


Example 11.38 (Extreme rainfall in Spain) 


A spatio-temporal Bayesian GEV model was recently proposed by [305] for 
modeling the occurrence and intensity of extreme rainfall in Extrémadure, 
Spain. Non-stationarity was taken into account through a linear temporal trend 
placed in the location parameter at sitei € {1,..., S}: 


i(t) = poli) + u @t + & (0), 


where (uo, 1) are terms of the type B'Z (Example 11.36) and ¢;(t) is model 
noise not taken into account in the spatio-temporal part. As in [332], autoregres- 
sive models were used for the regression coefficients B and weakly informative 
prior measures put on the coefficients a of the model. Though complex, this 
model can still be coded using (for instance) the BUGS language (from the 
OpenBUGS software, cf. Sect. 11.4.4). The authors were then able to run a 
progressive series of tests, including notably whether incorporating a temporal 
trend affected the estimation quality. 


10 If the prior is proper in Y and the likelihood has a maximum in the interior of the same space, 
the posterior distribution is automatically proper. 
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11.3 Calibration 


Methods for calibrating a prior distribution z (WY), i.e., estimating its hyperparameters, 
vary widely depending on the nature of available information, of which there are two 
main types: 


1. Prior information can be carried by other data obtained from studying the past 
(historical data) or from studying the region around the site of interest (regional 
data). The empirical Bayesian approach (Sect. 11.3.1.3) is a special case— 
generally disapproved from a theoretical point of view—where the actual data 
Xn is used, thus bringing a troublesome redundancy when using the posterior 
results. We nevertheless mention it here as it is of practical interest. 

2. Prior information can come from an elicitation’! study of human expertise. Spec- 
ifying who an expert is eminently related to the applied setting being considered. 
Along with [572], we consider that this typically means engineers, researchers, 
project managers, and specialists in one or several domains (hydrology, oceanog- 
raphy, climatology, etc.) with plenty of “field experience”. 


Calibration techniques are therefore based on a direct or indirect statistical interpre- 
tation of prior information, and the infinite number of possible combinations of such 
sources leads to just as many possible methods. However, we focus on only a small 
number here. Their strengths include being easily explainable and able to clearly 
signal the choices made due to the subjectivity of analysts. 


11.3.1 Data-Dependent Calibration 


11.3.1.1 Calibration from Historical Data 


In cases where the information is directly provided by historical data Xm € x1, 
several authors have proposed to use it in a Bayesian framework to calibrate prior 
distributions, especially in hydrology [588, 602, 638]. Three representative studies 
are [759], which looked at extreme sea levels for dimensioning dikes in the Nether- 
lands (Example 11.39), [588] which considered flood forecasting for the Garonne 
river, and [128] which looked at coastal defences in France in the context of Cyclone 
Xynthia (February 2010) in La Rochelle. 


11 From the Latin word eliciere, meaning to extract, obtain, bring to mind. 
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Example 11.39 (Marine flooding) 


To study dike dimensioning in the Netherlands, [759] proposed a Bayesian 
approach on a Gumbel distribution .# (y) by constructing a prior distribution 
based on a sorted sample of the m = 10 highest sea levels xf < ... < x7, 
observed in the past. The distribution of each i" order statistic x* is known 
explicitly given ⁄ (y): 


ele {En GR — Eu GT fa ly). 


m! 

QG — 1)!(n — i)! 
À statistical likelihood can therefore be determined, as well as its maximum 
likelihood estimator w,,. The bootstrap distribution of this estimator can be 
used to define the distribution x (y). This kind of method is relatively simple 
to implement, but the quality of the bootstrap distribution is limited by the 
small sample size, leading perhaps to overly informative prior distributions. 
The POT approach (limited to the £ = 1 case) can also be used to check that 
the results are coherent in a noninformative Bayesian setting. 


Remark 11.25 A similar approach based on sorting the smallest recorded values 
(without an environmental application in mind) has been proposed by [497]. Using the 
Gumbel distribution, the authors provided a full Bayesian framework for predicting 
new values. Based on a Jeffreys prior, this resulted in defining a new semi-conjugate 
prior (Sect. 11.2.3) with the help of the recorded sample. 


More generally, the point of view given in Equation (11.10), with x (y) defined 
as a posterior distribution of historical data, is meaningful: the conjugate Gumbel 
prior given in Sect. 1 1.2.3.2 falls exactly into this framework. Note that by expanding 
(11.10), there therefore exists a likelihood f (mly) such that 


mw) x f Emly) O). (11.27) 


If the historical data Xm is in the same form as the data x, (block maxima or POT), 
f is defined using the same densities as f. Thus, [128] used the uniform measure 
a" (wr) x 1 proposed by [602] for data with a GPD distribution F (w) = f(y), 
despite the difficulties mentioned in Remark 11.18 on the subject of the uniform 
distribution and the possible prior choices provided in Table 11.2. 

However, historical data can potentially be a mixture of maxima, exceedance, or 
simply Poisson count data [374, 759], which moreover may be noisy or censored. 
A consequence of this is that we need to take particular care!” in defining f (and 


12 Especially in the choice of the noninformative prior 7’, which should ideally be able to adapt 
to the presence of censored historical data; see for example [206]. 
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z^! (w)). See [128] for obtaining such likelihoods, as well as Example 4.9. Recall that 
w can define a parametrization containing both GEV and GPD parameters through 
(asymptotic) theoretical results between parameter triples [428, 487]: 


Ecev = GPD = §, 
Ocpp = OGEv + § (UGpp — MGEV), 


Lapp = UGEV + = (as —1), 


Remark 11.26 In fact, most authors treat historical information like a choice of 
statistical likelihood, guided by the type of data (censored, truncated, noisy, etc.), 
and describe their Bayesian framework as objective in that they rely on noninforma- 
tive measures x^” [588]. There is thus no elicitation of prior distributions, strictly 
speaking. Nevertheless, a prior defined as 1“! (y|x) is similar to the one defined 
as x! (yr) plus a historical data likelihood. 


Giving weight to historical information. Another problem is weighting historical 
information with respect to the data xy. This is because “historical data” may be 
informative but sparse (not measured/recorded systematically over time), poorly 
calibrated with respect to today’s standards, or less representative of current and 
future hazards than recently obtained data x,. For instance, very old data on a certain 
river breaking its banks may have been measured at a time when the river’s flow (its 
riverbed, etc.) was different or the measurements less precise (see the first chapter 
of this book and Sect. 5.2.1 for more details). 

It is thus necessary to decide on an equivalent data quantity m < m, “flattening” 
the historical data’s likelihood f (KmlŸ) (Fig. 11.4); that is, we replace f (XmlŸ) in 
(11.27) with 


F Galt 
S, F° Eml Y) dé 


fa Gmlv) = (11.28) 


This flattening rule is also called fractional [94] and indirectly defines what numerous 
authors call power priors [160]. The reason for doing this is as follows: let f; (|y) 
be the density of each historical data event x;. Supposing the x; are independent, we 
have 


Fény) =] [ Gil) 


i=1 


and 
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Fig. 11.4 Illustration of the flattening of a GEV likelihood with m = 40 historical data values, with 
p constant, form € {m, 2m/3} (right-hand column) and m € {m/2, 1} (left-hand column) 


fa Em) = argmin Ya KL (f, Ac). 


i=l 


where 


e each weight w; = w = m/m is an estimate of the fraction of information found 
in the historical data point x; with respect to a normal (non historical) data point. 
This so-called weighted likelihood approach was initially proposed by [398]. 

e the Kullback—Leibler divergence KL(f, Â (.|y)) is a measure of the information 


loss occurring when choosing f; in the place of f to represent historical informa- 
tion on y. 


Essentially, the geometric mean fi corresponds to an estimate of an unknown opti- 
mum f “conveying” in an exhaustive manner historical information on the parameter 
vector. 

If we suppose we are able to more precisely quantify the percentage of information 


6; € [0, 1] contained in historical data points x; with respect to current-day ones, the 
rule (11.28) can be refined into 


| [I Ë Gily) 
fa Gal Ÿ) = = | (11.29) 
i fi i Gi) d&m 


i=1 
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The approaches proposed in [128, 588] correspond to choosing m = m, flirting 
the risk of over-weighting the final posterior distribution with the prior. Note that 
other rules could also be considered for choosing m. For example, we could by default 
give the entire historical data record the same weight as a single regular data point 
(m = 1), representing a summary of the state of prior knowledge. This approach was 
considered by [220], among others. We could also provide a calibration rule of the 
form 


m* = arg max Gon (7 (.|M)||Xn) , 
m<m 


where 6n is a coherency rule between the prior x (w|m) and regular data x, (see for 
example [113]). 


Remark 11.27 The flattening procedure (11.28) is similar to the one used to cal- 
culate fractional Bayes factors [573], concurrents to intrinsic Bayes factors [81], 
which allow for model selection when priors z(y) are improper. Building a poste- 
rior distribution given a likelihood weighted by a power term is also referred to as a 
so-called generalized Bayesian inference [359]. 


11.3.1.2 Calibration from Regional Data 


It would seem natural to construct a prior x (y) on the parameters characterizing 
the distribution of a natural hazard at a given site of interest i with the help of 
information from neighboring sites j € {1,..., S}, if these are subject to similar 
natural hazards and have the same physical characteristics (similar land type, same 
relationship to a water feature, etc.) The region containing these sites is then called 
spatially homogenous, corresponding to a similar notion to the stationarity of tem- 
poral processes [322]. A typical approach is the one proposed by [177]: calibrate a 
multidimensional normal prior x (y) starting from a range of frequentist estimations 
(maximum likelihoods) at various neighboring sites. 

More generally, Bayesian regionalization methods aim to provide a sample x of 
extreme values characterizing the behavior of hazards across the supposed homoge- 
nous regional spatial grid. Another possibility is to produce a sample of estimates 
0 = {w,..., Ws} of the parameter vector y of the regionalized extreme value dis- 
tribution. Samples x or 0 can then be used to calibrate a prior x (y) on the parameter 
vector w at site of interest 7. This is only allowed if x does not incorporate observa- 
tions (potentially) made at site i or equivalently, if Y; ¢ 6. 

As for standard regional approaches (Chap.7), these methods require a 
homogenization metric for data (or estimators of y) to be put in place between 
sites,!* as well as a method for delimiting the homogeneous region [45]. The choice 


13 More generally, the terms upscaling or downscaling are used to describe this kind of homoge- 
nization. 
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of metric may be directly inspired by a standard rule of thumb, as in the following 
example. 


Example 11.40 (Regional flooding analysis) 

In order to predict flooding atasitei € {1,..., S} using data from the same region, 
which is supposed homogenous in terms of hydrological characteristics, hydrologists 
define what is known as a index-flood C;, where the rate of flow Q; at site i is 
such that Q; = C;Qr, where OR is a regionalized rate of flow (see Chap. 7). The 
strategy suggested by [562] and implemented by [648] is to consider that the extreme 
values at each site j 4 i should be homogenized via C;. Considering a GPD model 
(POT approach), [648] proposed a direct homogenization of the estimated parameters 
(Hj, 0j, Ej) at site j. The new estimated homogenized parameters (Aj, ;, č j) are 
calculated using 


[Lj = Cihj, Gj = Cio;, E; = Cig), 


and the scaling factor C; is estimated independently of the measurements taken at 
site i. The regionalized sample of (4j, 6:;, E j) then allows an estimation of the prior 
hyperparameters to be made on the vector Y = (u, o, £) characterizing the extreme 
value distribution at site i. In the case study [648], the choice of types of prior (log- 
normal distributions) was essentially justified by the positivity constraint on (u, ø). In 
the same POT setting and for the same application—but in a non-stationary context— 
[642] proposed a prior construction method using gamma distributions appreciated 
by Coles and Tawn [179] for their flexibility, summarizing prior regional information 
in terms of estimated quantiles at neighboring sites to i. 

The study [322], working in a GEV framework for the prediction of extreme 
floods, joined a supposed noninformative'* uniform prior 2! (y) with a statistical 
likelihood defined as the product of likelihoods, each characterizing the observations 
at one site, as a function of y (including censored data at ungauged sites, cf. Example 
4.9). 

Consequently, y does not refer to one site in particular, but a whole region, and the 
authors supposed that the data was independent, conditional on w. Removing from 
this likelihood the part L;(y) corresponding to the site à of interest, this approach 
leads to a posterior distribution which can play the role of an informative prior to 
combining to L; (y), following similar rules to those given for dealing with historical 
data (cf. Sect. 11.3.1.1). 

More recent approaches are based on metrics defining the proximity of sites with 
the help of geostatistics tools. Prior objects constructed in this way are generally 
hierarchical (cf. Sect. 11.2.4), as they aim to model the hypothesis of the existence 
of subjacent physical processes that characterize the data. 

Using such a strategy for a study of regionalized rainfall prediction under the 
MAXB framework, [634] suggested using a hierarchical approach to model M simul- 
taneous parameter values of the form £ over the set of M regions surrounding the 


14 See Remark 11.18. 
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one of interest. These values came from statistical estimates and made it possible to 
define a “hidden” regional sample. To take into account correlation between values in 
£—a necessity due to the physical nature of the problem—it would seem meaningful 


to put a prior on the vector (&),..., Em), such as for instance a multidimensional 
Gaussian: 
pi 
Éi,- Em) ~ Mu |, de 
Vu 


where Xz is a covariance matrix defined by 
2 
Ls = Of Pg, 


with 94 a correlation matrix and Of the variance of £. Here, a hierarchical approach 
corresponds to proposing a method for producing the v; and putting priors on og 
and p; which improve the posterior separation between the estimates of these two 
components of the covariance of (&1, ..., Em). We could for example consider an 
approach where zr (a2) is chosen to be noninformative and a correlation model based 


on the distance r; ; between measuring stations (or regions) i and j, such that 
pz (i, j) = exp(-r;;/R) fori # j, (i, j) € {1,..., MY, 


where R is a (hyper)parameter having the sense of a representative correlation dis- 
tance (which could itself be subject to supplementary information and merit its own 
prior distribution). 


11.3.1.3 The Empirical Bayes Approach 


The empirical Bayes approach [144], which consists of estimating the prior hyper- 
parameters using the data,!> is conceptually hard to accept [643] because it involves 
artificially duplicating all or part of the information provided by the likelihood. Pos- 
terior credibility regions are therefore potentially under-estimated. For such reasons, 
the regional hydrological analyses performed by [277, 493], based on an empirical 
calibration that uses data from the site being studied, have been firmly criticized by 
[648]. 

Nevertheless, if a sensitivity study demonstrates the robustness of the posterior 
distribution to this empirical calibration, this type of approach continues to be used, 
particularly in engineering, where it is perceived as a quick and often efficient approx- 
imation to the full Bayesian methodology (see for example [655]). Indeed, this was 


15 As opposed to the semi-empirical approach which estimates some of the hyperparameters using 
the data and the rest using real complementary information. 
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the approach adopted by [220] for the quasi-conjugate Bayesian model for a GPD 
described in Sect. 11.2.3.1, and by [515] for a Gumbel distribution. 


Example 11.41 (Extreme rainfall and temperature) 


Work by [162] has shown using simulation in the bivariate case that this 
approach, originally proposed by [430] for predicting floods, can be efficient 
when there are fifty data points or more for extreme value models conditioned 
on non-extreme-valued data. The method was applied to both rainfall and 
temperature data. 


11.3.2 Calibration from Expert Judgment 


11.3.2.1 Elicitation, Probabilistic Interpretation, and a Warning 


Expert judgment plays an important role in decision-making in many domains. It can 
give a guiding hand to studies and help to put scientific results in order of importance 
[180]. It can also considerably improve economic [464] and actuarial [748] studies 
on the financial impact of risks and associated flows. It may even be a deciding 
factor in a court case, public policy-making [522], or environmental governance 
[228, 516]. Its influence on technological, economic, political, societal, and even 
personal predictions and especially in establishing strategies for obtaining a profit 
has been analyzed in the epistemological literature for a long time [241, 280]. Recent 
studies conducted in the field of climate change use structured expert judgment to 
tackle the difficulties of deterministic modeling approaches, and to improve the range 
of predictions. Especially, the impact of the ice sheet on future sea-rise, threatening 
the viability of coastal communities, was assessed in such a way [55]. In the case 
of extreme value statistics applied to natural hazards, [177, 179]—among others— 
have highlighted the possibility of questioning experts on the upper quantiles of 
these types of distribution. In a similar manner, [706] (Sect. 3.4.2) used a regional 
analysis to build a prior model for a GEV distribution at a site of interest using expert 
knowledge on neighboring sites. 

Because it is a decision-making theory, the Bayesian theory allows the integration 
of expert opinions into a statistical problem, by interpreting such an expertise with a 
bet, namely a subjacent decision-making. 

This subjective point of view, progressively developed by Ramsey, Savage, De 
Finetti, and Koopman [281], simultaneously acts as one of the most attractive (based 
on “true-life” experience) and most criticized (how should the subjectivity be con- 
trolled?) features of the Bayesian approach. 

This kind of criticism is even more justified when dealing with extreme events. 
As Coles and Powell have noticed [177], a fundamental question is what importance 
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should we accord to advice coming from an expert who can be submitted to the stress 
of assuming some part of the responsibility for a possible serious accident. In this 
field, it would seem difficult to guarantee that expertise be risk neutral*, with little 
emotional bias, conflicts of interest, or other sources of bias that typically appear when 
asking for expertise [77, 755]. Court rulings, such as the one condemning six Italian 
experts from the “Major Risks” commission to seven years prison (October 22nd, 
2012) for having under-estimated the destructive power of the L’ Aquila earthquake of 
April 6th, 2009 (decision finally reversed in 2014), can strongly limit the willingness 
of experts to provide honest opinions about the imminence or extent of catastrophic 
events [470]. Indeed, it is quite possible that an expert will cede to the idea of non- 
compensation, described in the economics setting by Bernard Cazes. It means that 
“the successes of a forecaster cannot compensate for their failures, since supposing 
that if one given method has been used several times, it should lead to comparable 
results, whereas the presence of errors leads us to the question if the successes were 
simply fortuitous”. [702]. 

However, it is also a good idea to highlight the conclusion of Franklin and Sisson 
[290] from their in-depth bibliography on the application of extreme value distribu- 
tions: 


It is reasonable to give human intuition the ‘last word’ in risk assessment, while at 
the same time trying to use formal statistical methods as a kind of prosthesis to 
supplement its known weaknesses. 


Subjectivity—which is also expressed in the choice of probability distributions 
whose validity depends on the difficulty to verify theoretical conditions—will always 
taint the quantification of extreme hazards, whether Bayesian or not. In much broader 
applied settings, authors such as Gelman and Hennig [326] and Sprenger [716, 717] 
have emphasized the scientifically relevant and acceptable nature of the subjective 
Bayesian approach. For these authors, the use of available, moderate, and reliable 
expert judgment is a useful way to improve forecasts and extrapolation; to ignore it 
could be damaging or even nonsensical. 

To bolster confidence in posterior results, many formal elicitation techniques have 
been developed over the last 50 years, aiming to train experts and thus improve the 
accuracy of their assertions. Confronted with the permanent need to ensure that 
knowledge is retained, industry, the health sector, and financial industries have often 
provided the required funding to do so. In [572], the history and development of 
such methods are examined in detail, and useful advices are provided to limit the 
impact of biases coming from experts. The seminal article [755] and the works [335, 
421] illustrate such biases and provide decision heuristic* allowing expert advice to 
be “captured” by deciding whether risk aversion* (or the opposite: risk propensity) 
has corrupted the use of probabilities in the case being examined. Indeed, recent 
work—notably in neuroscience—tends to show that human reasoning, confronted 
with situations where knowledge is incomplete, leads to probabilistic outputs fol- 
lowing a Bayes rule type the Bayes rule type [151, 209, 346, 624, 683, 684]. The 
difficulty appears, however, when it is time to explain this inference process, using 
interpretative steps we typically call “providing expertise”. In a famous article, Allais 
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[33] showed that an explanation of certain decisions could not respect the maximum 
expected gain principle formulated in Eq. (11.6). 

The probabilistic encoding of expert judgment should therefore, above all, be 
seen as a useful (but certainly approximative) modeling tool that nevertheless brings 
formal coherence to judgments. Applying the rules of probability theory structures 
such judgments according to axioms of rationality, which provide a logical frame- 
work to uncertain information. The important Cox—Jaynes theorem [186, 411] from 
1946, constantly improved since then [597, 710, 739, 762], provides such axioms 
and shows that probability theory is isomorphic to the logical system built based 
on these axioms. Certain axioms have been sharply criticized as incapable of tak- 
ing into account the reality of how knowledge is explicited [372]. Relaxing them 
has led to theorems based on belief functions [699, 705] and possibility functions 
[231], according to which two dimensions are required to correctly represent the 
plausibility of a proposition. The addition of axioms founded on a formal definition 
of risk aversion tends to lead to non-standard probability theorems (see for example 
[354]). 

However, these axioms are supposed to represent objectivised information, not 
knowledge suffering from possible internal inconsistencies!® in which expressions 
depend on arbitrary symbolic systems (for example, the choice of a language). Recent 
results have shown that the conclusions of the Cox—Jaynes theorem remain robust 
even when the axioms of rationality are slightly modified [345]. It would thus seem 
legitimate to use probability theory to take into account uncertainty about a sub- 
ject examined by a cognitive system (human or machine) that could suffer from 
inconsistencies." 

For these reasons, this theorem remains one of the foundations of artificial intelli- 
gence, as well as one of the main justifications for the use of probability distributions 
in representing prior information derived from expert knowledge. 


11.3.2.2 Expert Judgment Based on Observed Variables 


For natural events, it would seem more simple to express prior information in terms 
of particular values of the observed variable X [320, 419, 420, 608]—considered as 
a natural anchoring variable* [421]. For extreme values, useful information means 
information on distribution tails, and we may wish to provide a return level x,,, for a 
given a, such that: 


P(X < xa) =a, (11.30) 


16 For example, the knowledge E on X can be reformulated as knowledge on the transform w(X) 
without knowing D = w(£), since w may not be easy or entirely possible for an expert to work 
with. 

17 Note that under the current state of knowledge, probability theory provides a pertinent descrip- 
tion of information related to an incompletely understood phenomenon, rather than an objective 
characterization of the true variability found in the phenomenon which would be accessible to an 
omniscient being [50]. 
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or, inversely, for an observed level x,, provide an estimator of œ. Being able to 
give one or several predictive prior quantiles (Xo, , @;)1<;<n, Such that a; < aj+4; and 
Xai < Xai» defined by 


P (X < Xa) = [ P(X sxalh) 700 dw (11.31) 
y 


= di, 


is considered by numerous Bayesian practitioners to be the best way to encode expert 
knowledge [78, 420]. Several arguments are in favor of this: 


e On the one hand, expertise provided by an expert can be considered to be made up 
of correlated information; that is, by giving a value of X, other values of X can be 
suggested (typically, two probabilistic bounds on the most probable values of X). 
Also, the joint distribution of prior information is invariant to any permutation of 
this information (basically, the order in which the quantiles are proposed should not 
matter). In other words, available information is considered non-independent but 
exchangeable, which characterizes data output by a prior predictive distribution 
[626]. 

e On the other hand, it is relatively simple to link the pair of variables (xs, , œ;) to the 
statistical interpretation (11.30) with the help of a piecewise linear cost function 
(cf. Sect. 11.1.4). We should however keep in mind that in doing so, the expert 
providing (11.30) may be considered neutral in terms of the risk generated by his 
or her own expertise. 


Example 11.42 (Predicting sea levels) 


For the reason mentioned above, methods involving questioning a group of 
experts—who formalize the choice of a probabilistic model over a set of 
options—look to provide weights for pooling often contrary opinions across 
experts, coherently, while attempting to limit bias related to the correlation 
between experts and possible conflicts of interest (see [572] for a qualitative 
review). This type of procedure, proposed in [180], was put into practice in 
[54] in order to give a prior evaluation of future sea levels. Readers interested 
in this still largely open problem can turn to the articles [30, 166, 330] for 
more information. 


The estimation of quantities viewed as moments or combinations thereof (predic- 
tive expected value and variance of X, etc.) is trickier as it presupposes statistical 
knowledge (a common example is the frequent confusion between the median and 
mean of the intensity of a phenomenon). Also, unlike quantiles, it is not always 
true that certain moments of a probability distribution exist. For this reason, the 
numerous methods listed in [572] suggest training experts to better understand the 
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questions raised by analysts and then better formulate responses. 


Let us assume a prior form!® (yw) = 2(W|w) hyperparametrized by œw. We can 
then write 


a;(@) = P (x < a) 


as defined by Eq. (11.31). These quantiles are related to the marginal prior density 


fe = [Far Go) dy. 


The goodness of fit between the prior information as represented by the pairs (Xa; , @;) 
and that which it is possible to model—the pairs (xg, , &;(w))—can be characterized 
by a loss function minimized with respect to w. Cooke [180] for instance has proposed 
a loss function originating in a discretized version of the Kullback—Leibler divergence 
between the unknown distribution on X which gives the quantiles xa; (which we could 
call the “expert” distribution), and the distribution f (x|q): 


M 
@* = arg min > (aj+1 — a;) log 
[0] 
i=0 


(Qj41 — a) 
(Či (Œ) — &(w))’ 


(11.32) 


with ag = & = Oanday+) = &y+) = 1, which ensures similar relative importance 
is given to the quantiles in the calibration. This expression could be assigned weights 
if we want to highlight the goodness of fit of certain prior predictive quantiles. How- 
ever, it is by no means certain that the function of w defined by the right-hand side 
expression in (11.32) is globally convex, which should allow for a unique optimizer. 
Decreasing as much as possible the dimensionality of w (for instance by making the 
other hyperparameters constant) will increase the possibility of the existence of a 
unique optimizer. As an aside, it can always be useful to test several loss functions 
(e.g., a relative least squares loss function). 


Estimating w* necessarily requires a numerical estimate of the right-hand side 
term in (11.32), which has to be sufficiently smooth to allow a gradient descent 
algorithm to work. This can be achieved using quadrature methods (like the Runge- 
Kutta method) or importance sampling [483]. See Example 11.43 for an illustration 
of this. Stochastic optimization algorithms like the Kiefer-Wolfowitz [437] one also 
perform well here. 


'8 By “form", we define the algebraic structure of the probability distribution. 
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Example 11.43 (Torrential rainfall) 


Here, we pursue Example 11.34 and look at a Fréchet distribution F (y) for 
modeling rainfall maxima in Corsica. An expert from Météo-France, schooled 
in statistics, gave in [118] three rainfall values described as predictive prior 
quartiles (11.30). These values x, and their credibility thresholds w are given 
in Table 11.3. Using the semi-conjugate modeling defined in Sect. 11.2.3.2 for 
the Fréchet distribution, we have 


P (X < xq) = || fit COR de 
= E) 


For fixed m, we therefore have to calibrate the hyperparameters w = (Xe; , Xes) 
of the functions sı and s2. Minimizing Cooke’s criteria (11.32) involves calcu- 
lating numerous times (11.33) for different values of w. Also, the normalization 
constant in z (u) is unknown. For these reasons, the choice was made in [118] 
to use a bi-dimensional grid over [50, 200] x [100, 200], disallowing xe, > Xe, 
and computing each estimation of (11.33) using a unique importance sampling 


iid 
(Ék, Mi ke(1,...m) © T (E, u) 
produced by 


u ~ (0,30), 
E ~ IG (m, mlog2). 


In effect, P (X < xa, ) can be estimated consistently by 


Table 11.3 Statistics-based specification of expert knowledge on torrential rain in Corsica (Exam- 
ple 11.34 cont.) given as a triple of predictive prior quartiles 


a—confidence level Rainfall x, (mm) 
25% 75 
50% 100 


75% 
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Table 11.4 Calibration of the hyperparameters (xe; , Xe, ) of the Fréchet prior for Example 11.43, 
in agreement with the available expertise on torrential rainfall in Corsica 


Virtual size Xe] Xe Effective quantile thresholds 

m (75 mm) (100 mm) (150 mm) 
1 100.00 127.27 25% 50% 75% 

2 95.45 136.36 23% 49% 73% 

3 90.90 133:33 23% 51% 73% 

5 87.91 133.31 23% 50% 74% 

7 86.36 133.28 24% 52% 73% 

10 84.84 131.81 25% 52% 74% 

15 84.78 131.24 25% 52% 74% 


M —m 
(apa) k 
2 Be (1 T s1 (x Ex) 


k=1 
M T x 
> Be 
k=1 
where 
na m (Ex |x) (Mk) 


T! (Ek, Uk) 


are importance weights. The choice of x” (E, m) here is specifically fitted to 
the case study in order to obtain relatively balanced weights (see [652] for 
more details): we simulate u around a plausible value of the expectation of the 
Fréchet distribution and € by supposing a plausible ratio of 2 between xe, — u 
and xe, — M. 

In doing so, all of the estimates of (11.33) are correlated, but the search 
for the optimum (11.32) becomes insensitive to the sampling. The calibration 
results produced are slightly biased due to correlation, but globally satisfactory 
as they reproduce the expert-defined constraints quite well (Table 11.4). An 
approach based on an estimation of the integrals in (11.33) using quadrature 
was slower and did not give better results [618]. 


Remark 11.28 As noticed by Berger [78], chap. 3, we should not expect a perfect 
fit between the œ; and the &; because the former are provided without knowledge 
of the sampling model for extreme values used in the problem, whereas the para- 
metric distributions have limited flexibility. The choice of a prior coherent with the 
sampling model!” implies that certain constraints on the quantiles cannot be satis- 


19 Tt is not therefore a relaxation of the sampling model, which could be interpreted as injecting an 
error into the model. 
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fied. Hence, it makes sense to prioritize a prior modeling that can satisfy one or two 
specifications—typically those for which our confidence is highest—and calibrate 
the hyperparameters using a rule like (11.32). 


Prior return levels and “hidden” historical data. It is of interest to note that return 
levels (Xq,); specified by an expert can also be interpreted as an information source 
corresponding to a virtual historical dataset of size m, the latter defined by the highest 
value of the precision a. A value of a = 0.9 (centennial level) requires that an expert 
“is aware of” at least 9 historical values below x, and one above (i.e., m > 10). Upon 
accepting this hypothesis and not allowing m to change, it becomes meaningful to 
associate with the provided expertise the censored likelihood defined by 


9m 


L(xelW) = [F (xa|Y)] © [1 — F (xa |y)” , 
and to define the prior as 


mb) x Laly (Wh), 


where x] (wy) is a noninformative prior. An alternative to the method (11.32) for 
calibrating a vector of hyperparameters w can therefore be proposed for a given 
choice of prior distribution z(y|@) by minimizing a distance or divergence Z [129] 
between the distributions: 


o* = arg min P (x” (Y) | z(Ylw)). 


To our knowledge, however, this approach has not yet been attempted. 


11.3.2.3 Expert Knowledge (Direct or Indirect) on Parameter Values 


To suppose that prior information is available on the values of y (by simple algebraic 
relationships) is by nature risky. The information provider is expected to know the 
parametric form and properties of the extreme value statistical model, and thus pro- 
vide predictions that are perfectly coherent with it. In the approach described in Sect. 
11.3.2.2, information is available on the prior marginal probability (11.30), whereas 
the parametric approach carries information on P (X < x,|Ÿ) directly. Though the 
expert advice still seems (in general) to be based on the observed variable X, the 
latter is no longer considered unconditionally on w. 

Calibration of the hyperparameters w of a prior distribution  (y|@) with a known 
form is generally performed by supposing that information is transmitted in the form 
of pairs (Xq,, @;), such that 


P (X < x |Y) = ay. (11.34) 
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A set of values (Wz)kes for w can be obtained by using basic algebra. Many authors 
propose calibrating features of x (yw) (moments, etc.) with the help of the yx. 

For instance, interested in a Poisson distribution for severe earthquake events, 
[135] has linked expert knowledge on the frequency of occurrence directly to parame- 
ters of a prior distribution. In the MAXB setting, Coles and Tawn [179] have proposed 
to use expert statements, interpreted as parametric quantiles, to construct gamma pri- 
ors on the parameters of a GEV distribution. In [706] (Sect. 3.4), numerous technical 
details on how to calibrate gamma distributions are provided, based on algebraic 
relationships and approximations to the behavior of these distributions. [706] has 
also looked at the extension of these methods to the bivariate case while suppos- 
ing that information from experts exists on the range parameter of a dependency 
relationship between two GEV distributions. This corresponds to the maximal geo- 
graphic distance under which we suppose that catastrophic events at the two sites 
remain correlated. Nevertheless, this approach remains highly focused on the direct 
case study in mind. 

Moving back to the univariate MAXB setting, Gaioni and his coauthors [298] have 
constructed samples of (yx) and conducted inference on the means and variances of 
a GEV distribution z(y). They apply this method to estimate the occurrence of flash 
floods in watercourses. A necessary precondition is to know three distinct quantiles 
(11.34), which is similar to the requirements in [179]. 

In the POT framework, [220] (Appendix 2) proposed to calibrate the modal value 
of z(y) from constraints placed on the quantiles. When predicting heavy rainfall 
[177] and flooding [589], these authors considered that experts interrogated on X 
were capable of providing two types of information. The first was on the mean annual 
number of events exceeding the extreme limit u, while the second was on differences 
between extreme value quantiles (return levels); for example: 


X0.99 — X0.9 = difference between centennial and decadal levels, 


X0.999 — X0.99 = difference between millennial and centennial levels, 


but also on the ratio of these differences, such as: 


X0.999 — X0.99 
X0.99 — X0.9 


Here again, sampling work on the prior values of y output from algebraic relation- 
ships allows calibration of the prior distributions. 


Though this approach is simpler than the one in Sect. 11.3.2.2 (which explains 
why it is used so often), this type of calibration is arbitrary because it requires us 
to define estimators of features of z(y) using a very small number of prior values, 
with the help of formulas like (11.34). In our opinion, preferring one estimator over 
the others based on rules from classical (non-Bayesian) statistics does not make a lot 
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of sense. For this reason, we recommend the marginal vision of prior information 
given in Sect. 11.3.2.2 and to invoke the parametric point of view only if we have 
access to a large quantity of prior information. 


11.4 Bayesian Computational Methods 
11.4.1 General Principle 


In the simplest case (one-dimensional, iid data), the posterior density x (W |Xn) of the 
parameters of extreme value models is known up to a constant: the (integral) normal- 
izing constant in Eq. (11.1). Numerical integration techniques (e.g., the Runge-Kutta 
method) can be used to calculate it. However, it is necessary to know the whole pos- 
terior distribution, whose tails influence the determination of credibility regions and 
Bayesian predictions. 

A general approach is to simulate a sample Ym = (W1,..., Ym) from x(Y|xn), 
which then allows Monte Carlo estimation of quantities of interest like (11.6). This 
simulation is necessarily indirect, i.e., produces a more simple distribution p(w) 
(called instrumental), using the following accept-reject algorithm (see [652], Sects. 
2.3, and 2.4 for technical details and proofs): 


Accept-reject method 


e Initialization WM = Ø. 
e While Card (Ym) < M 


1. Simulate Yg ~ p(w); 
2. Test whether yy could have been output by 2 (w|Xn); 
3. If yes, update: Yu = Ym U {Wk}. 


This test accepts or rejects the candidate yy; at each step. If the choice of p(w) 
is far from v (WY |Xn), the reject frequency is high and the algorithm works poorly in 
practice (Fig. 11.5). An efficient approximation to a sample with distribution x (W |Xn) 
can then be output iteratively by: 


1. producing a first-order Markov chain on W with stationary distribution 7 (Ÿ|xn) 
(cf. Sect. 4.2), 

2. the use of an instrumental distribution Yg ~ pk(Y|Fk-1) which evolves over 
iterations k of the chain and can be constructed dependent on the whole history 
F,_, of the chain, 

3. the selection and decorrelation of a simulated sample under the stationary dis- 
tribution. 
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Fig. 11.5 Two instrumental nel 
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density 


11.4.2 Markov Chain Monte Carlo Methods 


Monte Carlo sampling algorithms based on the use of Markov chains are usually 
named MCMC for Markov Chain Monte Carlo algorithms. These have become the 
go-to tool in Bayesian calculation [652], growing in importance in parallel with 
the rise in computing power. The most advanced such inference methods, work- 
ing in high-dimensional” settings and for complex statistical likelihoods and inte- 
gration constants in (11.1), continually generalize and improve on the underlying 
Metropolis—Hastings algorithm, which we describe below. Readers interested in a 
more theoretical approach should turn to the book [652] and to [501] for practical 
implementation details. Also, the article [638] gives a useful methodology for the 
extreme value statistical setting. Accelerated MCMC methods previously evoked 
are inspired by physics (Langevin dynamics) and link up with stochastic gradient 
descent techniques. See 11.4.4 for some additional details, and the interested reader 
is advised to read the article [552] for a recent review of these approaches. 


20 Including the number of latent variables in hierarchical models (Sect. 11.2.4). 
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Hastings—Metropolis algorithm. 

e Let {W, ..., We_1} be a chain (initialized for example by simulating from the prior 
distribution). 

e Select a new value yx: 


1. Generate a new candidate Yy ~ px(W|We_1) 
2. Calculate the Metropolis ratio 


ee fafon eee) | 
a, = min { 1, = 5 
f Kal Yr) (Pki) Pr (Vel Ver) 


3. Statistical test: let Up ~ Y[0, 1]; 


_ [de F 
choose y = | Wg—ı otherwise. 


The Metropolis ratio involves the ratio of the posterior distributions (which allows 
us to remove the integration constant): if Wy is ina higher-density zone of x (W|Xn) 
than y;,_,, the ratio will be greater than one. The inverse ratio of these instrumental 
distributions blocks us from automatically accepting the new point, thus helping the 
Markov chain to better explore the whole space W. A candidate point is accepted with 
probability a,. Here again, for more details on the convergence of MCMC chains 
produced in this way, we recommend turning to [652]. In our situation, a typical 
choice of instrumental distribution is a Gaussian, which implies that the sequence of 
candidates is a random walk [501]: 


We = Ye + Ek with Ek ~ N (0, t$), 


and the sequence (1?) k converges’! to a limit value t? > 0. See [41, 654] for practical 
examples and methodological suggestions (Fig. 11.6). 

If the parameter Y = (y, ..., Wa) is multidimensional (notably for hierarchical 
models), it is recommended to use Gibbs sampling [652]. Suppose that we have 
conditional posterior distributions 


pily, ...s Wa, Xn = (Wi |W, …., Wa, Xn), 
pali, Va, ..., Va, Xn © (Yalpi, Y3,..., Wa, Xn), 


Then the Markov chain produced by the iterative conditional simulations also con- 
verges to the posterior distribution under quite general hypotheses [652]. When condi- 
tional posterior distributions are not themselves completely explicit, a hybrid strategy 


21 Such MCMCs are called adaptive [46]. 
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Fig. 11.6 Three MCMC chains converging in parallel to the same posterior target distribution. The 
burn-in period corresponds to the number of iterations necessary for the chains to mix, which is an 
indication that stationarity has been attained 


called Metropolis-Hastings-within-Gibbs (MHwG) [414] can be used, consisting in 
running a Metropolis test in each dimension. 

To illustrate how this algorithm works, let us take the conditional structure of 
the posterior distribution (11.19) on the Fréchet parameters, applied to Example 
11.43 on torrential rainfall in Corsica. A typical MHwG algorithm for simulating 
this distribution takes the following form. 


Example 11.44 (MHwG algorithm for Fréchet (Example 11.43 cont.)) 


The Fréchet likelihood is written, for y = (u, v, £), with v = o!/Æ > 0, 


n n Sigil n 
falv, 4, &) = (Fe = w) exp {-» iG = pre] . 


i=l =l 


The MHwG algorithm takes the following form. 
Step 0 Initialize (vo, Mo, £o). 
Stepk > 0 > k+1 
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1. Simulate vg}; ~ Y (m + n,s1(ux, &) + } (xi — PE) 


i=l 


2. Simulate a candidate Eu ~ pe (ElEx); 


e calculate a; #41 as the minimum of 1 and 


F Enl Veti Mies Se esi Geer, HeT Fe tla) Ps Eel Eee) | 
nes, Le, EDT (Veg lEk, LOT Exel a) pe Ex: lE) 


e accept &41 = Eu with probability a1 441, 
otherwise choose &+1 = &. 


3. Simulate a candidate fiz41 ~ Py (uluk); 


e calculate a2 4+; as the minimum between 1 and 


Gale, Mees Ek+1) Vegi Eee, Mee Ee 7 (1) Pw (Melee), 
Faber, Uk, Fer Vet lék+1, LOT Exp [MT (ur) Ou (eri ee) 


e accept Uy+1 = x41 with probability a2 ,+1, 
otherwise choose [ug4.1 = x. 


Simulations were run choosing for p¢(.|§) a normal distribution truncated 
below zero, with mean & and coefficient of variation 20%, and for p,,(.|Ux) 
a normal distribution truncated above xe,, with mean jz; and the same coef- 
ficient of variation. Convergence of the MCMC chains is illustrated for the 
parameter & in Fig. 11.7 for the case where the prior distribution is basically 
noninformative. Several thousand iterations are required before starting to mix 
the chains. Calculated after a decorrelation step, the posterior characteristics 
of the parameters and several return levels are summarized in Table 11.5. 


Predictive posterior densities are also plotted, along with the histogram of 
observed rainfall, in Fig. 11.8. The influence of the choice of prior is visible 
in the form of the density function: the bigger the m, the larger the weight 
given to the expert’s advice. The dissymmetry in the data distribution with 
respect to the value of 100mm (which is the empirical median) is slightly 
pessimistic regarding the expert’s advice. This explains the shift to the left of 
the predictive distribution as the weight of the expert’s advice increases. This 
progressive increase in the force of the expert’s knowledge has little influence 
on the decadal return level (around 200mm, in agreement with the advice of 
the Météo-France expert), but much more on the centennial return level, where 
the uncertainty decreases dramatically. Despite the failure to take into account 
a trend linked to climate change, a centennial level slightly above 400mm 
appeared quite plausible to the expert, who was a specialist of Mediterranean 
weather patterns and Corsican micro-climates. 
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Fig. 11.7 Paths of four MCMC chains towards the posterior distribution of £ for a very small 
virtual data size of m = 0.1 (Fréchet distribution) 


Table 11.5 Posterior results as a function of the size of the virtual sample m accorded to prior 
expertise, for the Fréchet model and conditional on rainfall data in Corsica. The results shown are 
the posterior expectations; standard deviations are in brackets. Calculations were performed after 
50,000 MCMC iterations 


Virtual size H o E Return level (mm) 
m 10years (decennial | 100 years 
rainfall) (centennial rainfall) 
—8 a9) 110 en 0.30 0.04) 210 27 465 (190) 
5 —2 a7 98 (22) 0.32 0.07) 208 05) 449 (109) 
10 3 © 90 ao 0.34 (0.03) 202 (22) 420 (06) 
15 5 ©) 86 m 0.35 (0.03) 200 3) 410 «7 


11.4.3 Reconstruction of Incomplete Data 


Under the Bayesian framework, no distinction is made between parameters and 
variables in terms of probabilistic nature. Incomplete data, which may come from 
reaching a technical limit or a bias of some measuring device or historical missing 
data censored by one or several historical point values (cf. Examples 4.8 and 4.9), 
may potentially be reconstructed by conditioning on these historical values and a 
current estimate of the parameters. Other types of incomplete data involve mixture 
models of extreme value distributions (Sect. 12.2.2.3); in such cases, the “data miss- 
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ing part” is its attribution to one of the models in the mixture. Latent variables in 
hierarchical models (Sect. 11.2.4) are yet another example. 


Reconstruction of such types of incomplete data is called data augmentation, 
which can be done iteratively and naturally falls under the Gibbs framework. 
This method is the Bayesian counterpart to the Expectation—Maximization (EM) 
algorithm and its generalizations [510], which iteratively reconstruct a complete 
dataset (or a related statistic) in order to run maximum likelihood estimation without 
being bothered by complicated integrals in the definition of the data likelihood [690]. 


Curiously, though this type of approach is well-known in the reliability setting— 
examined in numerous missing data situations—it remains little used in extreme 
value statistics. [686] has looked only at the use of EM algorithms in the mixture 
model setting. In the Bayesian case, only [676, 677, 679] seem to have proposed the 
Gibbs algorithms for dealing with data augmentation in the extreme value statistics 
setting, in order to run a regional study with historical data and resolve multivariate 
inference problems. 


Experiments conducted at EDF for developing EM-type algorithms for GEV 
distributions have shown that this approach can highly computationally intensive 
since three-dimensional functionals have to be optimized at each iteration. Stochastic 
approaches, either SEM [146] or Gibbs (in the Bayesian setting), which simulate 
the complete data, appear to be better candidates for improving the estimation of 
extreme value distributions using incomplete data. Formalizing and scaling up such 
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approaches to the industrial setting —if they prove useful—is an interesting research 
avenue. 


11.4.4 Other Computational Approaches 


The use of MCMC as applied to extreme value models (in simple cases) can be 
greatly simplified with the help of the OpenBUGS software [726], which provides 
an implementation of the hybrid Gibbs techniques.?? The major constraint of this 
software is that it requires input Bayesian models to have proper priors. Indeed, 
these measures are required, among others initializing the MCMC chains and the 
instrumental simulation to run smoothly. Recent MCMC methods [550] are based 
on a comparison of the Markov chains with deterministic simulations of molecules 
following Hamiltonian dynamics; an evolution-gradient approach in this setting leads 
to much less correlation between elements of the chain than for standard MCMCs, 
as well as being more efficient (in terms of run time) [92]. This approach is thus 
particularly useful for dealing with high-dimensional spaces in hierarchical and/or 
multivariate problems, for instance with the help of the Stan software [141]. Most 
of these software packages can link up with R. See [446] for a well-presented look 
at how to implement them. 

This computational setting is undergoing incessant development. For instance, 
variational Bayes methods can be used to quickly approximate the main charac- 
teristics of 2 (Y |Xn) when the posterior distribution is difficult to get using MCMC 
(see for example [435]). In an application involving heavy rainfall in Australia (in 
the MAXB setting), [556] highlighted how variational Bayes methods can be useful 
when looking to evaluate the impact of continuous covariates on the occurrence and 
intensity of extreme hazards.” 

The promising bridge sampling approach can now use these variational methods to 
accelerate the Gibbs algorithm iterations and help the algorithm function efficiently in 
high-dimensional settings [358]. In conclusion, we therefore recommend that readers 
interested in using extreme value models in larger and larger dimensions (spatial, 
non-stationary, etc.) first inform themselves as to the current state-of-the-art, with 
the help of review articles and seminal works as [552]. 


11.5 Conclusions 


This chapter has given a (necessarily incomplete) overview of the use of Bayesian 
methods for statistically dealing with modeling, predicting, and extrapolating extreme 


22 In early 2018, extreme value distributions are still not encoded in its competitor JAGS [621]; 
users interested in the power and flexibility of JAGS must encode them using programming tricks. 


23 The authors approximated a GEV distribution, under weak conditions, by a Gaussian mixture. 
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hazards. A deeper look into the theory can be made with the help of the works [591, 
651], especially in terms of case studies. Fundamental aspects of the validation of 
Bayesian results are discussed in detail in [768]. The Bayesian framework is still 
little used in industrial studies and civil engineering, even though it provides a large 
number of opportunities for engineers working in industrial and environmental risk. 
In particular, it facilitates the inclusion of historical and/or regional data, as well as 
expert opinion, and helps to formalize essential questions on the issue of risk mitiga- 
tion. From the academic point of view, this research area is extremely fast-moving, 
especially as computational tools required for Bayesian computation are more and 
more powerful. Therefore why this chapter has mainly focused on modeling practice, 
which remains one of the trickiest difficulty in Bayesian practice. 

We are willing to bet that an increasingly important task in the industrial and envi- 
ronmental risk engineering community will be to improve the ease of understanding 
methods for eliciting prior information in the extreme hazards setting, stating pre- 
cisely what was chosen arbitrarily, and justifying parameter choices to “moderate” 
this. Obviously, any Bayesian analyst confronted with the problem of calibrating 
prior distributions must conduct a sensitivity study on quantities of interest (return 
levels, etc.) with respect to prior choices. These choices naturally include those relat- 
ing to the algebraic structure of prior distributions. To decrease the influence of such 
choices, it may be possible to take advantage of recent work looking to directly 
provide robust Bayesian inference [82]. For instance, [674] considered an extreme 
value setting where no formal rule allows for a simple approximation of a prior form, 
and attempts are made to avoid having to choose a particular form. The inference 
is performed in a framework including all possible prior distributions. This type of 
approach lies at the confluence of Bayesian theory and possibility theory [231]. 

Other methodological alternatives have also been proposed for integrating expert 
knowledge into the estimation of extreme events, for return periods in particular. This 
is notably true for the imprecise probabilities and the Info-Gap theory [71], which 
deal with the problem of robust decision-making vis-a-vis inherent uncertainty in 
the handling of extreme value distributions. Several developments and applications 
are summarized in [290] (Sect. 5.5) and should probably be compared with methods 
currently used in engineering. 

Furthermore, the likelihood-regularization mechanism permitted by the use of 
prior measures, combined with the ability of the Gibbs algorithms to reconstruct 
observed data and take advantage of the complete likelihood, already makes it pos- 
sible to significantly improve frequentist estimations in multivariate settings [222]. 
These extremely recent results from academia, for the moment essentially tested on 
simulated data, should lead to problems in more than three dimensions now becoming 
accessible. The question of adapting them to environmental data, in a world where the 
relationships between different natural hazards are increasingly of interest, appears 
to be of immediate importance. 
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Abstract This chapter provides a brief summary of the latest results obtained in 
extreme value theory, and offers many suggestions for the reader interested in using 
these tools in a context where, in particular, statisticians and engineers cannot ignore 
the Big Data paradigm and the industrialization of machine learning tools, now 
essential components of modern artificial intelligence. Parallels are also made with 
other disciplines interested in extreme statistics. 


12.1 Conclusions and Recommendations 


The philosophy behind extreme event modeling can be described as an attempt to 
reduce the impact of strong cognitive biases caused by easy access to non-extreme 
observations, and the loss of historical memory of past extreme events. Some such 
biases are in play in events called black swans (economic crashes, wars, scientific 
discoveries, etc.), popularized in 2009 by [734]. Arriving unexpectedly and with 
huge societal impact, these types of events are rationalized afterwards, as if they 
could have been predicted. This is an illusion, as [736] states openly. In effect, the 
complexity of the systems around us continues to increase, and it is already difficult 
in practice to predict in a more or less near future, non-extreme events. According to 
[735, 736], this error is the worst possible epistemological one that can be made in 
risk management. Due to increasing complexity, it is important to not systematically 
and indiscriminately rely on past data within mathematical models that claim to 
provide predictions of extreme events. Such data may become rapidly obsolete, and 
are no help in looking for new sources of risk. Risk management generated by those 
particular extreme events should instead be based on non-inductive* decision logic, or 
on the smallest possible number of axioms. This would imply not using mathematical 
modeling, which requires a clear delineation of the most influential factors, modeling 
theorems, and validation techniques based on simulation. 
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The class of events considered in this book does not belong, by scientific and 
societal consensus, to the black swans category [116]. This means that the relation- 
ship between the nature of the objects looked at and their mathematical description 
based on probability theory results from a long and ongoing process of discussion 
and testing. Arguments justifying the stochastic treatment of natural hazards were 
provided in Chap. | (Sect.3.1.2), but this justification remains relative to the state of 
science and technology, and its relevance should be re-examined on a regular basis. 

On the basis of this consensus, the theory of extreme values has for many years 
been a common and unique reference point for the estimation of extreme values 
output from natural phenomena. This theoretical framework has allowed engineers 
to overcome subjective extrapolation techniques and converge toward a common 
approach. It thus plays a leading role in guiding choices in natural risk mitigation, 
whether for the industrial world or society as a whole. We hope that readers are now 
fully convinced of this. 

Nevertheless, practical applications of the theory—which we hope to have illus- 
trated well in this book—are fraught with problems that are difficult to solve, and 
that sometimes put the credibility of extrapolation results into question. The estima- 
tion of the form parameter £, for example, is often the subject of long discussions, 
since a small change in its assessment (ordinarily a difficult task) has significant con- 
sequences for extrapolation to rare events. Controversy surrounds the occasionally 
very extreme values that are produced in estimates. Are they physically plausible? 

Often, debate ignites around the existence (or inexistence) of physical limits to 
the intensity of phenomena. Since the theory of extreme values provides for the 
existence of distributions unbounded above, what does this say about the acceptability 
of hypotheses involving limits? And if we admit the existence of an upper limit for 
the intensity of a phenomenon, how should it be defined? Physically speaking, the 
total rain capacity is limited by the (finite) amount of water in the atmosphere, but to 
imagine a phenomenon reaching this limit is obviously not realistic. In this case, can 
deterministic approaches, such as Probable Maximum Precipitation (PMP; [741]) 
or Probable Maximum Flood (PMF, [456]) serve as references? Would we be able 
to introduce this information into the estimation of statistical distributions if it were 
available and reliable? And how should we proceed for phenomena which logically 
should be bounded above, if analysis of the data leads us to estimate unbounded 
statistical distributions? And lastly, if we are content to extrapolate from distributions 
advocated by the theory, how can we integrate knowledge on physical phenomena that 
trigger extreme events? Here we touch on the limits of the application of statistical 
theory, which has justified, in this book, a deeper look at computational approaches 
(Chap. 8). 

Some adherents to the theory mention the fact that sometimes major assumptions 
(such as data independence and convergence of estimates) do not hold, which could 
explain statistical results that do not conform to expectations and to physical knowl- 
edge of a phenomenon. In this context, they work on how to give advice to ensure 
that methods are applied in a reasonable manner in accordance with the assumptions. 
Others—who are more skeptical—point to the fact that extrapolations of statistical 
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distributions are made without taking into account underlying physical phenomena, 
and are therefore unreliable by definition. In particular, it is common to refuse to 
extrapolate up to the frequency 10~” if we do not have a sample of at least 10” data 
points allowing us to validate it. By the same token, it is the fundamental asymptotic 
hypothesis, which underlies the entirety of the theory, which is considered unrealistic. 

Thus, the question is can this theory still be used, taking into account these limits? 
Some experts advise stopping at estimates that are close to the length of time over 
which data is available. This hypothesis is certainly very reasonable, but quickly 
prevents the use of the theory in a large number of practical cases where industrials, 
insurers or legislators wish to estimate frequencies lower than those which can be 
empirically provided by the given data. Our experience in using the data and mod- 
els described in this book, and our knowledge of industry, leads us to suggest the 
following. 

First, it is essential to make several estimates, via a number of different methods, 
under the two frameworks (MAXB and POT). Confidence intervals around parame- 
ters and functions of interest should always be calculated, provided, and discussed. 
Their definition depends on the models used, and their calculation is strongly influ- 
enced by the number of data. The calculation hypotheses must therefore be stated 
clearly, and the validity of these intervals, which are conditional on the choice of 
model, should be tested using simulation techniques. Moreover, it is always useful 
to compare the results produced with deterministic approaches with the probabilis- 
tic distributions resulting from statistical calculations. Discussions of these results 
must include input from one or more experts, which requires that statisticians pro- 
duce meaningful orders of magnitude, called anchoring values*. Keep in mind that 
the use of extreme value distributions is a simplified—sometimes oversimplified— 
response to strong social pressure on decision makers and risk quantifiers [107]. It 
is therefore necessary to consult and consult again before approving a result. 

Second, important work needs to be conducted on the validity of the data being 
used, including the statistical interpretation of historical and regional data. Statisti- 
cians must call upon historians and physicists to agree on one or several interpre- 
tations, and any strong sensitivity of the estimates to the keeping or removing of a 
certain data point must be discussed from a physical point of view. Is it an outlier? 
Is it due to the influence of an incorrect measure? The ongoing collection of new 
data and a critical analysis thereof is necessary, and this “novelty” should be con- 
nected with the forgetting of past extreme events that characterizes modern society 
(as described in the first chapter of this book). 

Third, the meaning of the concepts of confidence and credibility must be kept 
in mind. Very often, dimensioning rules are produced in terms of upper limits of 
confidence intervals, which offer a certain level of conservatism. However, this con- 
servatism must be justified, and its nature depends on the notion of underlying cost 
in dimensioning problems. The more information we have on this cost, the more the 
probabilistic interpretation of the limits can be clarified. Note that information on 
cost is part of the prior knowledge we have on a problem, and though imperfect, 
it would be a pity to deprive ourselves of it, as it helps us to place the theory of 
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extreme values into a (simplified) decision-support framework, which then allows 
statisticians to conduct sensitivity studies that can be comprehended by engineers. 

The increasing desire of civil society, including industry, to mitigate the effects of 
extreme natural hazards, requires a constant increase in the arsenal of techniques ded- 
icated to risk quantification. While we believe that despite its shortcomings, extreme 
value theory is a fundamental building block of this arsenal, we also believe that it 
must continue to evolve in relation to recent scientific advances. This concerns, for 
example, data collection and dropping calculation costs, which are starting to permit 
the analysis of high-dimensional models. The following paragraphs list some future 
paths in this direction. Our long-term vision is that the decision-support allowed by 
the theory leads to the proposition of anti-fragile mitigation rules [735]. That is, 
decisions that become more and more solid when new knowledge is added, at a level 
above and beyond robustness (if a hazard is powerful, mitigation is effective) and 
resilience (if a hazard is strong, mitigation may occasionally not be 100% effective, 
but still relatively so). 

From experience gathered in our industry working group, we are deeply con- 
vinced that approaches leading to evolution and innovation in these methods must be 
multidisciplinary. This includes the skills of various scientific communities, people 
working at the forefront of applied settings—dedicated to sharing the experience of 
the industrial sector—and that of other sectors of society, such as civil engineering. 
We hope that this book marks a significant step in this direction. 


12.2 Perspectives 


12.2.1 Analyzing Large Databases 


Without changing the theory itself, we have today hope of increased access to larger 
and larger and/or spatially distributed databases. The scientific community as a whole 
has new methods of observing the Earth (i.e., a massive deployment of sensors and 
satellites whose cost has decreased significantly since around 2010), and is bene- 
fitting from an increasing effort to reconstruct data via historical analyses, and the 
use of increasingly sophisticated simulation models. This bringing together of data 
types is a major scientific advance and a formidable asset, provided that models are 
developed that know how to exploit such large quantities of information, especially 
in the spatial sense. 

The detection of useful patterns, or variables (feature selection), by machine 
learning techniques is a fundamental issue for companies and institutions facing 
massive increases in available data (the Big Data setting). Challenges include not 
only the increasing number of sensors but also the diversity of measured variables, 
the range of validity of these sensors, their accuracy (which influences data quality), 
the increasing number of supposedly explanatory variables, as well as the high rate 
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of acquisition. The first attempts to use probabilistic modeling for extreme values 
from the real world uncovered in extremely large databases have been published quite 
recently [338, 701, 802], and seem promising. These are counterparts to methods 
aiming to identify systemic weaknesses in industrial infrastructure by the joint use 
of fault trees (classical tools in probabilistic safety assessment (PSA) studies) and 
statistical relationships between probabilities of occurrence and covariates [361]. 


12.2.2 Making Distributions More Robust 


The statistical approach to extreme values is still criticized, within the scientific 
community, for a certain lack of robustness (which implies we should not stick to a 
unique estimate in practice). This includes restrictions on types of data, the choice 
of estimators, and the choice of distributions used. As stated by [107], the theory 
applies only when the phenomenon is continuous, purely random, and the rate of 
convergence of the non-stationary (or minima) toward the limit distribution strongly 
depends on the nature of the distribution of non-extreme values. One potential path 
forward is to construct modeling theorems that work for a greater variety of data. 


12.2.2.1 Robustness of Statistical Estimates and Limits to 
Extrapolation 


The robustness of statistical estimates will clearly continue to improve due to region- 
alization when natural phenomena are not too “local”, and the Bayesian framework 
can be used effectively if—very generally speaking—additional information on top 
of local data is available (see Sect. 11.5). Also, the use of the entire extreme value 
data sample, including correlated data clusters, can also help strengthen statistical 
estimates of return levels or periods (or any other quantity of interest). It is therefore 
necessary to take into account similar but correlated over time extreme values in 
analyses. The copula-based approach described in Chap.9 would then need to be 
modified, as it requires events to occur at simultaneous times. 

A promising approach not requiring declustering, proposed by [647] and illus- 
trated on flood forecasting, relies, for example, on Markovian modeling using depen- 
dency structures between exceedance extreme values, and is situated somewhere 
between a POT approach and time series analysis. The number of dimensions of the 
underlying copula determines the order of the Markov process. This methodology 
may likely complement the functional approaches described in Chap. 8. In the GEV 
framework, a similar approach, Bayesian but restricted to Gaussian copulas, was 
recently proposed by [560]. As the use of extreme copulas improves, it is becoming 


' That is, the spacing used between measurement times, which can limit or even prohibit the 
data being loaded into computer memory, and force the detection of extreme values to be 
conducted in situ. 
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essential to focus on how to use them in high-dimensional settings (using vine cop- 
ulas for instance [76]), in order to capture information related to inertia in extreme 
phenomena (e.g., aftershocks of earthquakes, repeated floods, etc.), and to refine 
probabilistic forecasts by considering sequences of extreme events that happen in a 
very short time period. Indeed, it is not obvious that a dimensioning produced from 
a single hazard, even if bounded above, is also robust to a sequence of slightly less 
powerful hazards over a short time period. Beyond this, it appears essential to develop 
statistical models that consider several variables at the same time, and are also able 
to characterize the spatial distribution of phenomena and interactions between them. 
Other approaches based on mixing non-extreme data with extreme data, using con- 
ditioning, also constitute possible research directions, and have barely been looked 
at until now. The articles [162, 383] give an outline of such studies, in a multivariate 
context, for the frequentist and Bayesian frameworks. 

Lastly, the robustness of an extrapolation (such as a return level) itself strongly 
depends on the robustness of the statistical estimation. Establishing formally (and not 
just empirically) limits to extrapolation is a major roadblock, which has only recently 
begun to be attempted in a one-dimensional framework [29]. Here again, extensions 
to settings with multivariate, spatial, and trending data, are important research paths 
of the future. 


12.2.2.2 Distributional Robustness 


The difficulty of verifying theoretical conditions leading to the use of distributions 
from extreme value theory has recently led some researchers to study so-called 
distributional robustness in the context of extreme natural hazards studies, in order 
to severely limit the impact of modeling errors resulting from the theory. These 
theoretical limits have for example been studied by [496] for financial indicator time 
series. In response, the study [98] proposes a methodology for estimating quantiles 
(return levels) in a nonparametric way, by looking at a class of heavy-tailed statistical 
distributions located in the neighborhood of an extreme value distribution, rather 
than using one distribution only. The neighborhood is defined via the choice of 
a class of divergences between statistical distributions, and neighborhoods can be 
bounded by taking advantage of the convex geometry underlying the problem. As a 
result, it becomes possible to define bounds on the tails of the statistical distribution 
representing the phenomenon, and deduce robust estimates of return levels. 

Modeling errors resulting from the choice of a correlation structure between mul- 
tivariate hazards are also problematic. How to produce risk measures which are 
robust against errors caused by not completely certain choices? Recently, bounds on 
Pickands’ function and excess probabilities have been obtained by [250] as a par- 
tial answer to this question. Tested on financial data, this approach—again based on 
neighborhoods of measures (defined by divergences)—provides a conservative view 
of the co-occurrence of extreme natural hazards, in high-dimensional settings. 

The practical implementation of the results of this initial research, and extensions 
to more general frameworks, remains to be done. 
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12.2.2.3 Extending Results to Other Distributions 


Extensions of distributions described in this book are numerous. As an example, 
spatio-temporal characteristics of multivariate extreme values are now currently 
being studied through the prism of peaks-over-threshold analysis applied to func- 
tions, that allow to define complex extreme events as special types of exceedances, 
and then obtain their limit tail distribution, the so-called generalized r-Pareto process 
[284], a generalization of the approach proposed by [222]. 

To limit the risk of error due to the use of extreme value models, whose con- 
struction assumptions are difficult to verify, and to relax constraints on the nature 
of the data, an additional possibility is to include, in the class of distributions men- 
tioned above, some with new properties. For instance, the instability of extreme 
value distributions by summing extreme variables may prevent modeling a coales- 
cence phenomenon. Levy’s distributions, that are sum-stable and correspond to the 
asymptotic behavior of sums of random variables that don’t fall under the central 
limit theorem [36, 498], could be useful. 

Another possibility is to produce classes based on mixtures of extreme value distri- 
butions. Little-used in engineering, this point of view is currently being investigated 
by [286, 489, 580, 678]. 


12.2.3 Interactions Between Statistical and Numerical 
Models 


The coupling of statistical models to data from numerical models, used in a stochastic 
framework, represents a kind of parallel tool that attempts to decide how to implement 
encompassing hierarchical models [695], with the help of ever-increasing comput- 
ing power. The most significant attempts [637] in this direction still require strong 
assumptions on the invariance of parameters in the extreme value distributions. Addi- 
tional difficulties posed by the calculation time required of computational models 
can be partly countered using meta-models (evoked in Chap. 10)—possibly based 
on classical machine learning tools—which are capable of simulating large amounts 
of data. Recent approaches are reviewed in [455]. Developing meta-models that are 
well adapted to extreme value situations is therefore an important goal in linking 
these two major information sources. 


12.2.4 Sustaining a Multidisciplinary Approach 


Though this book focuses on the use of extreme value statistical theory for character- 
izing natural hazards, the theoretical and practical insights of the theory are part of a 
much broader scientific field. In particular, keeping an eye on methods developed in 
finance, insurance, and fraud detection is an ongoing necessity. Though these areas 
have long been characterized by large quantities of data, where noise is low or non- 
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existent, the economic stakes are such that many developments from these domains 
will become useful in the study of natural hazards. 

Let us briefly look at three such examples of important work, dealing with risk 
perception, the selection of explanatory variables, and the definition of new measures 
of extreme risk. 


12.2.4.1 Risk Perception 


The perception of risk by people working in financial markets can be partially cov- 
ered by insurance contracts; the case of extreme risks associated with large levels 
of uncertainty remains, however, problematic, in the sense that no clear methodo- 
logy exists for determining the price of insurance premiums [235]. The inability of 
economic actors to solve this kind of problem testifies to the need to adopt, with a 
certain dose of humility, multiple approaches for calculating extreme risks, which is 
in line with the recommendations in the preceding section. 


12.2.4.2 Selecting Explanatory Variables 


In the field of insurance, hierarchical Bayesian analysis is often used to produce a 
model that can deal with heterogeneity problems in extreme value data, when these 
originate from different sources [102]. Even though applying this to extreme value 
data coming from the natural world (which have normally been arranged to form 
an as homogeneous as possible dataset) is not immediately obvious, the scarcity of 
certain events could lead us to pool data from events sharing explanatory phenomena, 
whether correlated or not (for example, floods due to heavy rains and/or melting ice), 
or measured by different methods, and to differentiate the specific characteristics of 
the shared features in the statistical analysis. 


12.2.4.3 New Measures of Extreme Risk 


In the specific context of risk mitigation associated with natural events, exceedance 
probabilities, levels (quantiles), and return periods are not the only subjects of pos- 
sible dialogue with regulatory authorities [452]. Other risk measures that are well 
adapted to extreme events have been studied by various communities—in particu- 
lar financial and actuarial—and have better coherence properties.” This is true in 
particular for conditional values-at-risk [656], also called Bregman super-quantiles 
[449], which characterize how tails of statistical distributions behave beyond extreme 
quantiles. 


2 Especially, the sub-additivity of a risk measure Z(X + X’) < A(X) + #(X'). See [656] for more 
details. 


Part III 
Detailed Case Studies on Natural Hazards 


Chapter 13 A) 
Predicting Extreme Ocean Swells SE 


Pietro Bernardara 


Abstract This chapter illustrates the conduct of a classic univariate extreme statis- 
tical study on severe ocean swell forecasting. 


13.1 Context of the Study 


13.1.1 The Variable of Interest and Probability Levels 


Following the procedure described in Sect. 3.1.3, the variable of interest here is the 
significant spectral height (see below) of the sea state, which describes the surface 
of the sea when under the influence of wind (which generates wave systems) and 
swell. The significant spectral height is an often-used quantity for characterizing 
the sea state. It is often denoted H,, Hmo, or SWH (significant wave height). This 
variable can be obtained via the spectro-angular distribution of the sea state variance 
F(f, y) (usually expressed in the units m? - Hy! -rad~'), also known as the direc- 
tional spectrum of the sea state variance. This function characterizes the division 
of the sea state’s energy across frequencies f and the directions y of origin. The 
significant spectral height is calculated starting from the zero-order moment Mo of 
the directional spectrum of the variance: 


Hy, = 4V Mo, 


with 


Mo = [” [ [Fe ay| df. 
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Under the so-called deep sea hypothesis that swell heights are distributed according 
to a Rayleigh distribution, the significant spectral height is equal to the statistically 
significant height H,/;3, which corresponds to the mean height (measured between 
peak and trough) of the highest third of wave heights. 

The probability level required for this study corresponds to a centennial return 
period, i.e., p = 0.01. The quantity of interest to estimate is, therefore, the centennial 
significant spectral height. Following recommendations similar to those expressed 
by the ASN (the French Nuclear Safety Authority) in [15], we would also like to 
estimate the upper limit to the 70% confidence interval associated with this quantity 
of interest (see Sect. 5.3). 


13.1.2 Data Collecting 


Our case study is about high swell events off the coast of Yeu Island (in the Vendée 
department of France—see Fig. 0.1). We do not, however, have directly observed 
significant swell height data (e.g., from wave buoys or high-frequency tide gauges); 
instead, the data used has been generated using sea state and swell height models. 

These (reanalyzed) simulations of spectral heights were run using the ANEMOC 
database (a numerical atlas of oceanic and coastal sea states), which contains si- 
mulated sea state data created by LNHE (National Hydraulics and Environment 
Laboratory) of EDF in collaboration with CETMEF (the Center for Maritime and 
Fluvial Technical Studies) [74]. The ANEMOC database was constructed using the 
TOMAWAC software [75] to perform 23-year simulations with a temporal resolution 
of 240 seconds over the period 1979-2002. The simulated data are available at a time 
resolution of 1 hour. TOMAWAC has been developed by LNHE to model the temporal 
and spatial evolution of the directional spectrum of the sea state. 

These reanalyses produced with the help of ANEMOC were calibrated and va- 
lidated using comparisons with wave buoy measurements found in the CANDHIS 
database from CETMEF for the years 1999 and 2000 [74]. 


13.2 Sampling Extreme Values 
13.2.1 Visualizing the Data 


The aim is to extract from the available data a sample of independent extreme values, 
sufficiently numerous and stationary for classical theorems (Chaps.4 and 6) to be 
applied. This process leads to a sample of 23 annual significant sea heights, as shown 
in Fig. 13.1. 

This sample seems rather small for conducting a study of extreme value statistics, 
so we instead use a POT threshold exceedance approach to increase the number of 
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Fig. 13.1 Sample of annual 
significant heights (in m) of 
the sea state, extracted from 
the ANEMOC database for 
1979-2002 
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useful data points. For this second sample, the threshold is fixed at u = 4 meters, and 
a minimal separation of 2 days between consecutive values above this value is used, 
leading to a set of 226 data points. These values (2 days, 4 meters) were chosen after 
consulting with experts on the duration of typical storms in the region and what they 
consider an extreme swell to be. This sample is shown in Fig. 13.2. 

It turns out that the number of occurrences per year can be represented by a 
Poisson distribution, whose parameter À is estimated by À = 9.83 using maximum 
likelihood (equal here to the mean number of events exceeding the threshold per 
year). The quality of this fit is shown in Fig. 13.3. 
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Fig. 13.3 Fitting a Poisson 
distribution to the process of 
the number of occurrences 
per year exceeding the 
threshold (y-axis) in the POT 
sample of significant heights 
over 4 m. The x-axis 
corresponds to the number of 
threshold exceedances 


13.2.2 Testing for Stationarity 


Stationarity is a fundamental hypothesis when working with extreme value statis- 
tics. In the present situation, stationarity of the sample of annual non-stationar is 
not rejected by the Mann—Kendall test (cf. Sect. 5.4.3.2), which was run using the 
ManKendal1 function from the Kendall package in R. For a type I error of 
5%, the p-value is 0.245. This large value indicates that the observed test statistic is 
not found in the range of extreme values of the distribution of this statistic under the 
stationarity hypothesis, meaning that it cannot be rejected. The same test run on the 
sample made up of peaks over 4 m confirms the result, with a p-value of 0.689. 


13.2.3 Analyzing Data Independence 


Here we wish to check whether the sample is composed of iid values and therefore 
consistent with the hypotheses involved in the main extreme value statistics methods. 
To check for independence of the sample values, Fig. 13.4 shows the calculation of 
the autocorrelation coefficient (cf. Sect.5.4.1.1). As the vertical lines remain within 
an interval—bounded by the dotted lines—indicating the predominance of a white 
noise, the observed annual non-stationar can therefore reasonably be considered 
decorrelated, if not independent (see Sect. 6.5.2.2, Fig.6.8 for more details). Simi- 
larly, for the sample of POT values, the calculation of the autocorrelation does not 
lead to the rejection of the independent data hypothesis (Fig. 13.5). 
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Fig. 13.5 Autocorrelation of 
the POT sample 
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13.2.4 Analyzing the Sample Size 


The sample size of annual non-stationar is relatively small. This may lead to a high 
level of estimation uncertainty and thus to particularly wide confidence intervals. 
Here we recommend paying particular attention to the obtained widths. The size of 
the POT sample is, however, satisfactory in view of the criteria considered in Chap. 5. 
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13.3 Statistical Estimation 


The two extreme value distributions induced by the sampling methods (GEV and 
GPD) are estimated here using maximum likelihood, and the confidence intervals on 
the estimators produced using the delta method (Theorem 4.4). 

For the GEV distribution estimated using the sample of annual maxima, the esti- 
mated parameters have the following values: Ê = —0.31, à = 8.10, and ô = 1.46. 
The value for the shape parameter £ is strongly negative. Because of this, the dis- 
tribution is bounded by xmax = 12.80 m. It is impossible, however, with current 
knowledge, to justify the statistically estimated upper bound by the existence of a 
physical bound on the phenomenon. It is simply a feature of the tail of the best-fitting 
distribution for the data available to us. 

This feature appears again when estimating a GPD distribution using the sample 
of values over the threshold u = 4 m. The estimated parameters have the values 
Ê = —0.15 and o = 2.04 m. Because of the negative value of the shape parameter, 
this distribution is bounded by Xmax = 17.60 m. The confidence intervals obtained 
in both situations are coherent with each other. However, the much larger number of 
data in the latter sample means that its estimate of £ is more pertinent. 


13.4 Tests and Graphical Validation 


The statistical distribution chosen for the study needs to be validated using statistical 
tests and an empirical analysis of the results. We thus apply the Kolmogorov—Smirnov 
test to the sample of block non-stationar (cf. Example 4.4) using the ks.test 
function in R, in order to test the goodness of fit to a GEV distribution with parameters 
u = 8.10 m, o = 1.46 m, and € = —0.31. For a type I error of 5%, the test statistic 
is R, = 0.112 and the p-value is 0.902. The goodness-of-fit hypothesis is, therefore, 
not rejected at the 5% level. This result is corroborated by the same test applied to 
the threshold exceedance data for a GPD distribution with parameters ug = 4 m, 
o = 2.04, and £ = —0.15, giving 


Ra = 0.049, 
p-value = 0.650. 


The plots shown in Fig. 13.6 help visualize the quality of the fit of the parametric 
distribution to that of the sample of annual maxima. The top two plots illustrate the fit 
with respect to a line of reference, indicating that the data are globally well modeled, 
including the most extreme ones. The return level plot (bottom left) shows modeled 
and observed return levels as a function of annual return periods. We see that all of 
the points are located inside the confidence interval. Finally, the last plot (bottom 
right) contrasts the theoretical distribution with the empirical one. 
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Fig. 13.6 Goodness of fit of the GEV distribution with respect to the distribution of the sample of 
annual non-stationar 


Figure 13.7 gives the results for the sample of POT values. The fit of the GPD 
distribution to the data appears better than the GEV one. We remark also that the 
confidence intervals around the GPD distribution are not tighter than those around 
the GEV one. The effect of the small sample size of annual non-stationar is offset 
by the fact that the GPD distribution has a less negative shape parameter, meaning 
that the GEV distribution is more tightly bounded. Given the difficulty in fitting this 
parameter, and its particularly high absolute value, its value obtained for the GEV 
distribution is quite uncertain and probably not reliable. 


13.5 Results and Critical Analysis 


In this case study, the variable of interest is principally the significant height z associ- 
ated with a required probability level, here for a centennial return period. According 
to the GEV fit, we obtain an estimate of the significant centennial height z100 of 
11.67 m, with an upper bound to the 70% confidence interval of 12.40 m. As for the 
GPD fit, with an average of u = 9.83 values selected per year above the threshold 
u = 4 m, we obtain an estimate of the significant centennial height z100 of 12.67 m, 
and a upper bound to the 70% confidence interval of 13.90 m. 
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Fig. 13.7 Goodness of fit of the GPD distribution to the set of significant heights over 4 m 


The generally good fits obtained mean that we can approve—in this initial 
analysis—the choice of the POT approach and the results associated with it. The 
main factor in this decision is the much larger dataset available. However, before 
definitively validating the results, it is advisable to carry out a more precise study 
of the use of the GPD distribution, in particular, by refining the level of significance 
(or type I error risk) of the statistical tests being used. In effect, the choice of a 
5% level is now being challenged in a growing number of settings (cf. Sect. 4.1.3), 
and lowering it to 1% or even 0.1% can help better evaluate a phenomenon’s prop- 
erties based on the data (by testing the independence and stationarity hypotheses 
more comprehensively) and progressively move toward more realistic models. In 
this sense, we consider that the results from this initial study are an illustration of the 
“historical approach” to extreme value statistics, which is still nevertheless widely 
used in certain engineering communities. 


Chapter 14 A) 
Predicting Storm Surges cilo; 


Marc Andreewsky 


Abstract In this chapter, we present a second example of statistical estimation of 
extreme quantiles: millennial quantiles for storm surges at Brest (France), based 
on hourly sea-level measurements. We run sensitivity tests on parameter values, 
on the choice of analytic models for distributions, and on the statistical estimation 
methods chosen. Uncertainty in estimates and associated confidence intervals are 
also calculated and compared. 


14.1 Data Collection and Preprocessing 


14.1.1 Source Data 


The available data is an hourly record of the sea level at Brest (Figure 0.1) over 
the period 1846-2010, with an effective duration of observation of 150.25 years. 
This data is the property of the French Navy and SHOM*, and is accessible at the 
REFMAR website.! The quality of the data was studied in a recent Ph.D. thesis 
[625], among others. This study also takes advantage of retro-predictions of tide 
levels obtained using the software PREDIT, which originated from a collaboration 
between SHOM and EDF. 


‘http://refmar.shom.fr. 
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14.1.2 Constructing Storm Surges Samples 


14.1.2.1 Calculating Open Sea Storm Skew Surges 


Recall that high-tide (HT) exceedance (storm skew surge) is defined as the difference 
between the observed and predicted sea levels at open sea. As we are talking about the 
difference, it must be taken into account that, due to timing errors, the zero points of 
the phases of the two curves are not necessarily the same. This difficulty is illustrated 
in Fig. 5.4. 

Calculation of the HT exceedance occurs as follows: from hourly tide measure- 
ments, a quadratic polynomial is constructed, and its maximum M, is recorded. 
Since the theoretical maximum M, is known, the HT exceedance is simply defined 
as Mı — M2. Remember that M is calculated without taking into account eustatic 
change, which will also need to be dealt with (cf. Example 5.21). 

The use of HT exceedances has the advantage of simplifying the dependence 
model between storm surges and the tide itself, and limits the risk of error due to the 
phase shift mentioned above. However, information on exceedances is only available 
at open sea. We therefore have less data that if we calculated the hourly exceedances. 
Nevertheless, those hourly exceedances can be calculated with high bias due to this 
phase shift. 

At sites with a high tidal range* like Brest, exceedance/tide dependence and the 
error linked with the phase shift cannot be ignored. One way to deal with these is 
to simply work with HT exceedances, which is the choice we make here. The HT 
exceedances and the tide levels themselves can be considered independent in this 
study because open sea levels are always relatively large (unlike in the Mediterranean 
Sea, for example). Furthermore, the risk of coastal flooding is a priori only high at 
open sea. The HT exceedance variable has therefore several benefits. 


14.1.3 Accounting for Eustatic Change 


The definition of the mean annual sea level (useful for calculating eustatic change) 
we adopt here comes from the Permanent Service for Mean Sea Level (PSMSL). To 
calculate this, temporal means of measured levels are used, based on daily, monthly, 
and annual periods. These means are calculated while accounting for missing data. 

Variation in the mean annual sea level (eustatic change) is principally linked 
to variations in the climate, movements in the Earth’s crust, atmospheric pressure, 
and ocean currents. However, the tide retro-prediction software PREDIT does not 
take eustatic change into account: tides are retro-predicted around a constant (and 
current) mean annual level. Eustatic change must therefore be taken into account 
when calculating storm surges if we do not want to add bias to the calculations 
between theoretical and measured tide levels. 


2 http://www.psmsl.org/data. 
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Fig. 14.1 Four models of eustatic change tested over storm surge Brest data. Calibration based on 
CALI: annual means, CAL2: linear regression over the annual means, CAL3: a constant followed 
by linear regression, and CAL4: two optimized linear regressions 


Data has been recorded at Brest over largely more than 50 years (actually, 150 
years), which is a sufficiently long period to observe whether a clear eustatic trend 
emerges—which it does here. It is therefore possible to model variation in mean sea 
level in order to remove it from measured levels. Four calibration methods, illustrated 
in Fig. 14.1, have been tested for correcting the hourly levels of the eustatic change: 


CALI : setting based on annual means; 

CAL2 : setting based on regression over the annual means; 

CAL3 : setting based on a continuous piecewise linear regression in which the first 
segment is constrained to have zero slope. The junction between the two 
segments is chosen visually; (this selects the year 1892) 

CAL4 : a setting method similar to the previous one, except that the first segment is 
constrained to have a non-negative slope, and the junction between the two 
segments is found by minimizing the quadratic error (this selects the year 
1915). 


In the end, a maximal difference of only 4 cm is obtained between the values 
associated with the empirical quantiles of the storm surges corrected of eustatic 
change across the four methods. The choice of method therefore has little influence on 
the extreme values. We have chosen to use CAL3 in this study, for which the change 
in slope occurs in 1892. Its second linear segment (after 1892) has the equation: 


Y = 1.45X + 1251.05 (mm). 
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Fig. 14.2 Storm skew surges at Brest since 1846 after accounting for eustatic change using the 
CAL3 method. The circled exceedance is related to the severe storm that occurred on 16 October 
1987 


According to this model, an increase of 1.45 mm occurs each year after 1892, leading 
to a shift of about 15 cm over the 20th century. The sample of eustatic-corrected 
exceedances is displayed in Fig. 14.2. 


14.2 Methods 


14.2.1 Sampling Methods 


As described earlier in this book, two sampling methods are advocated in extreme 
value theory: peaks over threshold (POT) and block maxima (MAXB, usually blocks 
of one year). The difference between the two is the way in which they select extreme 
values. 


1. Under the POT approach, the sample used in the current study is constructed as 
follows: 


e only observed values greater than or equal to a predefined threshold uo are 
taken into account; 

e storms (characterized by one or several high tides above the chosen threshold) 
are delimited in time using some redescent criterion (cf. Example 5.20) on the 
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storm surges; a storm ends when the value of the HT exceedance drops below 
some threshold (e.g., uo /2); 

e if during a storm we observe several successive storm surges above the chosen 
threshold, we retain only the maximal value among them (declustering); 

e the resulting sample studied is intimately linked to the predefined threshold. As 
this choice suffers from arbitrariness and uncertainty, we test several values, 
keeping in mind that a typical year has 2-10 storms. In the end, however, it is 
the total number of storm surges retained that is of most importance (for Brest, 
even though we have less than two values per year, the fact that the time series 
is quite long means that the final number of exceedances will be quite high). 


2. Under the MAXB approach, the sample used here is constructed as follows: 


e the original sample is divided up into maritime years (1 August-31 July) mainly 
so that the whole of the northern hemisphere winter is contained in the same 
year’; 

e years with too many missing data are removed (the criterion used here is: more 
than 20% missing data over a year and/or more than 10% missing data in 
winter), except when we are sure that a year’s maximum has been recorded; 


e the final sample to be fitted is then simply the set of annual maxima. 


14.2.2 Statistical Distributions and Tests 


In the context of this study, the distributions tested and fitted by maximum likelihood 
are shown in Tables 14.1 and 14.2. 

A certain number of tests, described in Sects. 4.1.3 and 5.4.3.2, can be used to 
check data quality, help choose the threshold (in the POT case), and look closer at 
the relevance of the parameters estimated for the fitted distributions in each setting 
(POT and MAXB): 


e Testing stationarity: we use a test for equal means and equal variance. 

e Uniform distribution test: this tests whether the dates corresponding to peaks in 
the sample are uniformly distributed over the period of observation. 

e Testing for independence: for testing the independence of successive peaks over the 
threshold, autocorrelation coefficients between these selected events are estimated 
(cf. Sect. 5.4.1.1). Recall that the criterion used to select storm surges in the POT 
setting is as follows: two exceedances close in time are both retained if and only if 
they are both over the threshold uo and, between them, there is another exceedance 
below uo/2 (1.e., redescent criterion—see Example 5.20). 


3 Most storms that hit France occur in winter. 
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Table 14.1 Distributions used to fit the POT sample 


Distribution Cumulative distribution Comments 
function 
GPD F(x) = Advocated by extreme value 
1/8 
i= | EG uo)} theory 
Exponential F(x) = Special GPD case for £ = 0 
1 — exp (—p {x — uo}) 
Weibull F(x) = B is a shape parameter 
1 — exp (—p {x — uo}P) 


Table 14.2 Distributions used to fit the MAXB sample 


Distribution Cumulative distribution Comments 
function 
GEV F(x) = Advocated by extreme value 


Chien) E 


Gumbel F(x) = Special GEV case for £ = 0 


exp (exp (2 {x — u))) 


Chi-2 test: to run the POT method and look for the most appropriate threshold to 
use, a x7 test (among others) can be used, measuring the relevance of the estimated 
probability distribution. This test is in addition to a visual analysis of the fit. 


14.2.3 Evaluating Data and Results in Practice 


Below we list several criteria for deciding whether to retain or remove obtained 
samples, based on the tests listed above, in the MAXB and POT settings: 


the number of extreme values selected to be used in the fit must be greater than or 
equal to 20 at the very least; 

the mean number of extreme values per year needs generally to be 2-8, perhaps 
stretching to 1-20. However, in the end, it is the total number of events retained 
which is more important; 

the p-value associated with the occurrence of peaks with respect to a Poisson 
distribution should be at worst 5%, though, given the specific case, it could be set 
as small as 0.1%; 

tests for stationarity, independence of peaks, and uniform distribution of peak 
dates, are accepted with very small error risks a < 1, ideally following the rec- 
ommendations in [417] (in general, values of 5% and 10% are still commonly used 
in the literature). 
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In addition, several exclusion and acceptance criteria for estimated distributions 
can be used, which depend in particular on the confidence level chosen to define 
confidence intervals over estimated functions of interest (cf. Sect. 4.13). In this study, 
this is set at 70%, a typical choice for dimensioning (see for example, [15]). Depen- 
ding on the project, it may be desirable to increase this value to 90% or even 95% 
[321]. The criteria are as follows. 


We set the risk associated with the x? goodness-of-fit test to a low threshold B (e.g., 
if B = 10%, 10 equiprobable classes are taken into account in the calculations). 
The table of x? fractiles then tells us that the value of the test statistic needs to be 
less than 14.7 so that we have a one over ten chance of erroneously concluding 
that the distribution of the data and that of the proposed fit are not the same. 

One exclusion criterion is based on characteristics of the graphical representation 
of the fitted distribution: if it has an empirical tail distribution (the last 4 or more 
values) outside the 70% confidence interval and pointing in a radically different 
direction to that of the distribution being fitted, the estimated distribution is not 
retained. 

One criterion of admissibility is that the value of the x? criterion (4.2) associated 
with the fit is small or minimal for the selected threshold. 

If in a neighborhood of the selected threshold, the plots with respect to the cen- 
tennial (or millennial) threshold and the parameters of the fitted distributions are 
relatively stable, these add credibility to the choice of threshold. 

Finally, if the graphical representation of the fitted distribution shows that a large 
number of observations are contained within the 70% confidence interval—in 
particular for the extremities of the tail—it also adds credibility to the fit. 


14.3 Estimation Using Annual Maxima 


The MAXB approach consists of selecting the largest storm surge of each year 
and fitting a GEV or Gumbel distribution to the resulting sample. Here, the test 
for stationarity is accepted for risks of 10% and 20% (remember this is Student’s 
t-test of equality of the means) for all time intervals. The fits obtained are shown 
in Fig. 14.3. Note that the fits associated with the GEV (top) and Gumbel (bottom) 
distributions are almost identical. The millennial return level estimation results are 
shown in Table 14.3. 

In Table 14.3, we can see several differences in the point values, depending on 
the distribution. We also note that the differences seen between the two distributions 
tested for a given quantile are smaller than the lengths of the estimated confidence 
intervals. The fits were obtained using maximum likelihood, and the parameter esti- 
mates are shown in Table 14.4. We further note that the x? values do not contradict 
these fits. 
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Fig. 14.3 GEV (top) and Gumbel (bottom) fits along with 70% confidence intervals 


Table 14.3 Millennial return level and its 70% confidence interval, and length of this confidence 

interval (CI), for the GEV and Gumbel distributions under the MAXB approach 

Distribution 70% lower bound | Millennial level | 70% upper bound | Length of CI (m) 
(m) (m) (m) 


GEV 1.2 1.32 1.44 0.24 


Gumbel 1.32 1.38 1.44 0.12 
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Table 14.4 Parameter estimates for the GEV and Gumbel distributions along with p-values of the 
x? test for a 10% significance level 


Distribution Parameter Values 
GEV H 0.56 
o 0.12 
E 0.024 
p-value x? 10.0 
H 0.56 
Gumbel o 0.12 
p-value x? 9.71 


14.4 Estimating Storm Surges Using POT Approach 


14.4.1 Search for Optimal Thresholds 


Three distributions are estimated from a sample of independent exceedances (storm 
surges) over some threshold (i.e., the POT approach): the theoretically based GPD 
distribution and two other distributions considered statistically reasonable based on 
the data histogram and the historical treatment of such data, even though they are 
at odds with the theory (cf. Sect. 3.2.1)—the Weibull and exponential distributions. 
Their historical use in maritime hydrology is reflected in their existence in tradi- 
tional software tools like ASTEX [433], developed by LNHE. We conclude that the 
redescent criterion (introduced in Example 5.20) allows us to claim that the two 
selected exceedances come from different storms. For HT exceedances, we consider 
that successive storms are separated by at least 24 hours. 

For each distribution tested, we look for an optimal threshold by minimizing the x? 
criterion (4.2), by using an iterative approach. The candidate optimal thresholds for 
each are as follows: 


e the best x? for the exponential distribution is obtained for a threshold of 0.63 m, 
with 96 exceedances selected above it; 

e the best x? for the GPD distribution is obtained for a threshold of 0.60 m, with 
129 exceedances selected above it; 

e the best x? for the Weibull distribution is obtained for a threshold of 0.64 m, with 
85 exceedances selected above it. 


In all three cases, stationarity, independence, and uniform distribution hypothe- 
ses are tested and validated, as the Poisson distribution. Moreover, the differ- 
ences between the optimal thresholds are very small, and for each, the number of 
exceedances selected is relatively high (which is a positive point in general in terms 
of reliability of the fitting results). We also wish to check, in addition to a visu- 
ally acceptable fit, whether the results are relatively stable across the board for the 
millennial quantiles, upper and lower bounds of the 70% confidence intervals, and 
parameters around the candidate optimal thresholds, for the three distributions tested 
(Figs. 14.4, 14.5, 14.6, 14.7, 14.8 and 14.9). 
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Fig. 14.4 Variations of the exponential parameter as a function of the threshold. The circle indicates 
the zone in which the threshold associated with the optimal x? is found for this distribution 
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Fig. 14.5 Variations in the upper and lower bounds of the point estimate and its 70% confidence 
interval as a function of the threshold for the exponential distribution. The circle indicates the zone 
in which the threshold associated with the optimal x? is found for this distribution 
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Fig. 14.6 Variation in the GPD distribution’s parameters as a function of the threshold. The circle 
indicates the zone in which the threshold associated with the optimal x? is found for this distribution 
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Fig. 14.7 Variation in the upper and lower bounds of the point estimate and its 70% confidence 
interval as a function of the threshold for the GPD distribution. The circle indicates the zone in 
which the threshold associated with the optimal x? is found for this distribution 
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Fig. 14.8 Variation in the Weibull distribution’s parameters as a function of the threshold. The 
circle indicates the zone in which the threshold (=0.64) associated with the optimal x2 is found for 
this distribution 
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Fig. 14.9 Variation in the upper and lower bounds of the point estimate and its 70% confidence 
interval as a function of the threshold for the Weibull distribution. The circle indicates the zone in 
which the threshold associated with the optimal x? is found for this distribution 
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Fig. 14.10 The exponential distribution fit (threshold = 0.63 m) 


In each case, there is no sign of high instability around the optimal threshold 
candidates for either the lower or upper bounds of the 70% confidence interval, 
quantile values, or parameter values (if this had not been the case, we could of course 
have considered other candidate thresholds). The fits obtained for the distributions 
tested are shown in Figs. 14.10, 14.11, and 14.12. We see in these one exceedance 
located far from the 70% confidence intervals, corresponding to the major storm from 
October 1987 mentioned earlier. Apart from this outlier, the fits are relatively good 
(the other points are inside the confidence intervals—themselves not overly wide). 

Ultimately, the POT approach is broadly relevant here, as indicated by the robust 
estimates obtained. The optimal thresholds and parameter estimates for each distri- 
bution are shown in Table 14.5. 


145 Conclusion and Discussion 


In this example, the estimated quantiles (return levels) were essentially the same, 
independent of which fitting method we used, applied to either the distribution from 
the theory or from other “expert” distributions. But these results are not generic. 
The selection of a quantitative approach (the most conservative or otherwise) must 
always be conditioned on the importance of the installation requiring protection; a 
“naturally” conservative approach (powered by control authorities in France as in 
other countries) would be to choose the upper bound of the associated confidence 
interval in question. 
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Fig. 14.11 The Weibull distribution fit (threshold = 0.64 m) 
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Fig. 14.12 The GPD distribution fit (threshold = 0.60 m) 
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Table 14.5 Optimal thresholds and parameter estimates for the GPD, Weibull, and exponential 
distributions 


Distribution | Threshold Parameters Lower limit | Millennial | Upper limit 
(m) level 
70% (m) (m) 70% (m) 

GPD 0.6 o 0.096 1.21 1.27 1.33 

E —0.031 

uo 0.86 
Weibull 0.64 p 11.6 1.12 1.21 1.29 

B 1.07 

uo 0.57 
Exponential | 0.63 p 10.36 1.19 1.25 1.32 

uo 0.639 


Note also that the 1987 storm event, sitting outside the calculated confidence 
intervals, is a red flag to some experts, leading to doubts as to the pertinence of the 
model fitting. Regional analysis (see Chap. 7) can help provide answers in this type 
of situation. 

Finally, note that the plots shown in this chapter, linked to statistical tests, are 
driving factors in decision-making (accepting or not an estimated statistical distri- 
bution), but can also be supplemented by further distribution selection tools, e.g., 
QQ-plots (cf. Fig. 6.6) and mean exceedance plots [333] which displays the evolu- 
tion of X — u| X > u as a function of u. As indicated in Proposition 6.4, the latter is 
a linear function of u when X|X > u is a GPD distribution. 


Chapter 15 A) 
Forecasting Extreme Winds cilo; 


Sylvie Parey 


Abstract This chapter considers the modeling and forecasting of extreme wind 
values. This one-dimensional problem is dealt with in depth and provides a third 
example of the deployment of the “classic” methodology for the treatment of natural 
extreme values. 


15.1 Context of the Study 


15.1.1 Study Variable and Probability Levels 


The focus of this study is the maximal instantaneous wind speed in Nantes (Loire- 
Atlantique department of France, cf. Fig. 15.1) at an altitude of 10 m. We are looking 
to estimate the 50-year probability level, i.e., a return level of 50 years, along with 
the upper boundary of its 95% confidence interval. The variable of interest is exactly 
the same as the study variable, so it can be investigated directly. 


15.1.2 Collecting Data 


Data source. In general, wind studies in France involve data recorded at Météo- 
France weather stations, which are spread out across the country. The data includes 
the daily maximal wind speed and the hour in which it occurred. Our study will use 
this daily maximal wind speed data. The Nantes weather station data covers the period 
1 January 1972-31 December 2003, i.e., 32 years or 11,688 days, and in particular 
contains the (non-aberrant) data for 26 December 1999: Cyclone Lothar. As this 
dataset is quite complete, there is no need to include other sources of information. 
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Fig. 15.1 Histogram of the daily maximal wind speed in Nantes 


Data preprocessing. The Météo-France dataset has very few missing values —less 
than 10% across the whole set of weather stations. In fact, for the Nantes station, 
there are no missing data. Nevertheless, the following steps are for the general case, 
in which the mean in the data for a given date is assigned to a specific example of that 
date which has a missing value. Note that when there is missing data, it is important 
to make sure they do not correspond to dates where extreme events occurred. 

The time series of daily maximal wind speeds in Nantes is made up of whole 
numbers between 3 and 37 m/s. As the statistical theory of extreme values is designed 
for continuous variables rather than discrete ones, a small continuous noise in the 
form of a random draw between —1/2 and 1/2 has been added to the values. This 
is a heuristic known as jittering (described on Sect. 5.4.1.3), and it can be seen as a 
way to compensate for errors due to rounding by the measuring equipment. 

In addition, to simplify the process (e.g., in terms of matrix operations), the data for 
each 29 February has been removed, such that each year has the same number of days. 

We can then plot the histogram of observed daily maximal wind speeds after 
preprocessing, which is shown in Fig. 15.1. 


15.2 Extreme Value Sampling 


Recall again that two types of sampling are possible: block maxima (MAXB) and 
threshold exceedance (POT). An analysis of the properties of the subsequently 
obtained sub-samples (e.g., stationarity, independence, and sample size) must be 
conducted. 
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Fig. 15.2 Distribution of the instantaneous daily maximal wind speed per season 


15.2.1 Sub-sample Analysis 


Wind is a seasonal phenomenon, and so is strong wind. A seasonal analysis, shown in 
Fig. 15.2, confirms that the wind is strongest in autumn and winter. It would therefore 
be wrong to suppose that extreme wind values are evenly spread across the year; we 
will thus limit our study to autumn and winter. In order to precisely select the months 
to be retained, we can look at the distributions of 1-5 annual maxima per month of 
the year, compared with evenly spread distributions (Table 15.1). It seems reasonable 
to retain October—March as the months in which strong winds occur. 


15.2.2 Sampling Methods 


Each autumn-winter period straddles two years. In order to work with years which 
entirely contain each seasonal cluster of extreme values, we shift the definition of a 
year so that it starts on 1 April and ends on 31 March. Under the MAXB approach, 
we retain the maximum value obtained each year, which, given the length of the time 
series, means a total of 31 values. As for the POT approach, it begins by looking for a 
threshold which best satisfies Pareto distribution properties. In particular, this means 
linearity in the mean excess as a function of the threshold (see Proposition 6.4) and 
how constant the shape parameter is. 

To study these, we plot (using the R package ext Remes) the evolution of the 
mean excess and the shape parameter of the fitted distribution for various values of 
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Table 15.1 Comparison of the distribution of 1-5 maxima with evenly distributed data 


Max. Evenly Jan. Feb. Mar. Apr. May June 
num. distributed 

1 2.67 6 8 0 0 0 

2 5.33 14 11 3 2 1 

3 8.00 23 13 10 5 2 1 

4 10.67 32 16 10 6 4 1 

5 13.33 38 23 13 7 4 1 
Max. Evenly July Aug. Sep. Oct. Nov. Dec. 
num. distributed 

1 2.67 0 0 2 6 

2 5.33 0 0 3 3 12 

3 8.00 1 0 3 8 10 20 

4 10.67 1 0 5 14 13 26 

5 13.33 1 0 6 14 20 33 


the threshold (Fig. 15.3). In addition, to ensure independence, if consecutive values 
exceed the threshold, only the largest is retained (declustering, cf. Sect. 6.5.2.1). The 
mrlplot function of the extRemes package plots the evolution of the excess 
above different thresholds. We can also recode these diagnoses by accessing and 
modifying the function’s code; here we do this with the aim of zooming in on the 
highest thresholds. Figure 15.3 appears to indicate that a threshold of 23 m/s would 
appear reasonable. 


15.2.3 Detecting Non-stationarity 


Stationarity is a fundamental requirement for fitting extreme value distributions. It 
is necessary to ensure that the mean and standard deviation remain stable over time. 
Running the Mann—Kendall test (see Sect. 5.4.3.2) using the Kendal11 package in R 
on the wind data for the autumn/winter period given in terms of the maxima does not 
identify any significant trend: for a 5% risk, the test does not reject the hypothesis of 
an absence of trend in the sample, even though the associated plot (Fig. 15.4) seems 
to indicate a slight increase in the maximal speed over time. 


15.2.4 Data Independence Analysis 


We now test whether the resulting samples are composed of independent and identi- 
cally distributed data. Figure 15.5 shows the calculation of the autocorrelation coef- 
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Fig. 15.3 Mean excess (top) and evolution of the shape parameter (bottom) as a function of the 
threshold 
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Fig. 15.4 Applying the Mann—Kendall test to the maximal annual wind speeds 
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Fig. 15.5 Autocorrelation of the block maxima sample 


ficient (Sect. 5.4.1.1) using the R function acf. As the vertical lines remain in the 
confidence interval (dotted lines), the independence hypothesis is not rejected. As 
for the POT approach with its 130 values over the threshold of 23 m/s, the estimated 
autocorrelation does not invalidate the independence hypothesis either (Fig. 15.6). 
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Fig. 15.6 Autocorrelation of the POT sample with a threshold of 23 m/s 


15.3 Statistical Estimation 


For each of the approaches (MAXB and POT), the parameters of the corresponding 
extreme value distribution can be estimated using maximum likelihood. As for the 
confidence intervals for extreme quantiles, we use the delta method (cf. Theorem 4.4); 
this can be run using the R function return. level. 


15.3.1 MAXB Approach 


For the first fit, the estimated parameters and their 95% confidence intervals are as 
follows: 


u = 27.4 [26.3; 28.6], 
o = 2.75 [1.92; 3.58], 
€ = —0.05 [—0.36; 0.27]. 


The confidence interval around the shape parameter & is quite wide, which is not 
surprising given the low number of maxima (=31). In addition, it covers 0, which 
suggests that a Gumbel distribution could be chosen. Indeed, the point estimate for 
& is itself quite close to 0. 
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15.3.2 POT Approach 


As a reminder, according to extreme value theory, if the block maxima sample has 
a GEV distribution, then the sample of maxima above a threshold u has a GPD one. 
Estimation of this distribution’s parameters can be done using the fevd function in 
the R package extRemes with u = 23 m/s. In the initial fit, the estimated parameters 
and their 95% confidence intervals are as follows: 


o = 4.07 [3.10; 5.03], 
g = —0.18 [—0.34; —0.01]. 


The shape parameter £ here is clearly negative, unlike in the previous case. However, 
remember that with a threshold of 23 m/s, we have 130 independent values above the 
threshold, whereas before, only 31 values were used. This second fit can therefore 
be considered more reliable. 


15.4 Tests and Graphical Aids 


15.4.1 MAXB Approach 


The plots shown in Fig. 15.7 help us to visualize the goodness of fit of the parametric 
distribution to the obtained sample. The plots at the top show the fit with respect to 
lines of reference; the whole of the sample appears to be well-modeled, including 
points at the extremes. The bottom-right plot shows the modeled and observed return 
level as a function of the return period, given on a log scale. We see that all of the 
data points are contained within the confidence interval. 


15.4.2 POT Approach 


Here too the goodness of fit can be visualized in the same ways, as shown in Fig. 15.8. 
Overall, the fit is reasonable, and broadly comparable to the block maxima one. 
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Fig. 15.7 Fitting a GEV distribution to the block maxima sample 
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Fig. 15.8 Fitting a GPD distribution to the sample of values over the threshold of 23 m/s 
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15.5 Results 


15.5.1 Influence of Jittering 


In this investigation, the study variable is similar to the variable of interest, so cal- 
culations are straightforward. The results obtained for a 50-year return level and its 
95% confidence interval are the following: 


MAXB POT 
Return level (m/s) 37.3 37.0 
95% CI [32.1 ; 42.4] [33.3 ; 40.7] 


The two methods give similar results. However, in order to better understand the 
impact of jittering, i.e., adding a small level of noise to the original discretized data, 
we use a bootstrap approach [244], performing jittering 1000 times and recalculating 
the return levels and confidence intervals, taking the mean at the end. We obtain the 
results placed in the following table: 


MAXB POT 
Return level (m/s) 37.0 36.9 
95% CI [32.1 ; 41.9] [33.2 ; 40.7] 


The results from this are very similar to what we originally found. Next, to get an 
idea of sensitivity to the choice of threshold, we run all of the POT calculations again 
with a threshold of 22.5 m/s, then 23.5 m/s. Averaging again over 1000 jitterings, we 
obtain 


e for a threshold of 22.5 m/s (which leads to retaining 140 values): 36.7 [33,1 ; 40,3]; 
e for a threshold of 23.5 m/s (which leads to retaining 120 values): 37.1 [33.2 ; 41.1]. 


The results are thus little affected by small changes in the choice of threshold. 


15.5.2 Validation 


The highest value observed in this 32-year time series was 37 m/s on 3 February 
1990. The maximum speed on 26 December 1999 was 35 m/s. In fact, 98% of the 
daily maximum speeds are below 25 m/s. Thus, it would appear reasonable that 
37 m/s corresponds to a 50-year level. 

We have also estimated the wind speeds associated with return levels of 1/p. 
However, once the distribution of the maxima has been estimated, nothing stops us 
from estimating the probability associated with a given maximum, provided that this 
value is indeed in the domain of the series’s non-stationary. For example, for the 
wind speed variable, we can estimate the return probability associated with the value 
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Table 15.2 Estimation of the probability that the wind speed exceeds two values (32 and 37.9 m/s) 


after GEV modeling 


P(X > 32) P(X > 37.9) 
Estimate 1.6- 107! 2.0- 1072 
95% CI [0.8 - 107!2.8- 107] [1.10 - 1074 9.0- 1077] 
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Fig. 15.9 Profile likelihoods obtained by varying the probability of exceeding a fixed value and 
optimizing for each of these probabilities the likelihood with respect to the distribution’s scale 
parameter 


32 m/s by noticing that P(X > x) = 1 — P(X < x). In the GEV setting, we thus 
obtain 
P(X > 32) = 1 — Fy(32) ~ 0.16, 


where here Fy is the cumulative distribution function of the GEV distribution esti- 
mated using the mean parameters obtained over the 1000 fits. In the POT setting, 
with the threshold of 23 m/s, we have 


P(X = 32|X > 23) = 1 — #09 ~ 0.24, 


where Fy is now the cumulative distribution function of the estimated GPD distribu- 
tion associated with the threshold of 23 m/s. The estimated survival functions for the 
two approaches are shown in Fig. 15.10. Recall that the survival function is defined 
as | — F(x), where F is the cumulative distribution function of the random variable. 
These plots also show the estimation of the upper bound of extreme wind speeds for 
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Fig. 15.10 Top: survival function for the GEV model estimated using 500 fits. Bottom: mean 
annual exceedance frequency for the POT model estimated for the threshold of 23 m/s for two 
distinct intervals. The upper bound for the maximal wind speed under this model is shown in the 


bottom-right plot (red cross) 
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the threshold exceedance approach. Under the MAXB approach, there is high uncer- 
tainty in the value of the shape parameter £, and in the average of the 1000 jitterings, 
its value is close to 0, leading to an extremely high and physically questionable upper 
bound. As for the POT approach, the shape parameter is on average —0.18 and the 
estimated upper bound of the distribution’s support (finite when € < 0) is 45.7 m/s. 


GEV modeling POT modeling 
Upper bound (m/s) Tak 45.7 


Remark 15.29 Recall that when é < 0, the upper bound of the support of distri- 
bution of extreme values is finite. Its value does not correspond to a return level; in 
effect, above this upper bound, all exceedance probabilities are zero (conditional on 
the pertinence of the given distribution). If its value is quite dependent on that of £, 
the bound is unlikely to be physically realistic. 


To conclude the study, the confidence intervals for the probabilities of exceeding 
a given value (that are part of the defined domain of the extreme values) have been 
obtained using profile likelihood. These are summarized in Table 15.2. The 50-year 
return level obtained with the GEV distribution, whose parameters are the mean 
ones, thus corresponds well to a 50-year level. Figure 15.9 illustrates the estimation 
of these probabilities along with their 95% confidence intervals obtained using profile 
likelihood. 


Chapter 16 A) 
Conjunction of Rainfall in Neighboring gas 
Watersheds 


Nicolas Roche and Anne Dutfoy 


Abstract This chapter presents, in great detail, a statistical study of extreme values 
in a bivariate setting, where the hazards considered together are rains in nearby water- 
sheds. This type of natural hazard can cause destructive flooding. The different steps 
of the treatment methodology are clarified, and particular care is paid to checking 
the conditions of the relevance of the theory. 


16.1 Introduction 


In France, the dimensioning of hydraulic works is based on the estimation of extreme 
quantiles of hydrometeorological events. The quantiles to examine, as well as the 
methods for estimating them, are not the subject of any rulebook—official or other- 
wise. Nevertheless, engineers can, for instance, base their work on the practices of 
CTPBOH! or the recommendations of CFBR? [5, 16]. 

In particular, EDF has developed the SCHADEX method, presented in Chap. 18, 
which aims to estimate extreme flow quantiles by combining extreme rainfall and the 
hydrological state of a watershed. The joint use of probabilistic rainfall, continuous 
climate chronicles on the watershed in question, and a rainfall-runoff model to eval- 
uate hydrological outcomes for the latter, leads finally to determine the distribution 
of extreme flows. 

In this type of approach, when studying neighboring watersheds, it becomes ne- 
cessary to take into account dependencies between these two watersheds, in particular 
in view of anticipating the occurrence of extreme-valued events. 

In the following, we are going to study the case of the watersheds of the Drac and 
Isère Rivers at Grenoble (Isère department of France, cf. Figure 0.1). After an analysis 
of the daily rainfall in the two watersheds, we will examine dependence in their 
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extreme values. After modeling this, we will attempt to define relevant conditional 
probability distributions. 


16.2 Data Analysis 


The data used to characterize the Isère and Drac watersheds at Grenoble, recorded 
at the stations W1410010 and W2832020, respectively, can be obtained from the 
HYDRO databank*. 

The Isère River is an important left-bank tributary of the Rhône River. Its source 
is in the Graian Alps of Savoie, and it merges with the Rhône a few kilometers north 
of Valence. At Grenoble, its watershed is 5720 km? and its flow around 180 m°/s. 

The Drac River is a tributary (affluent) of the Isère River in south-eastern France. 
With a length of around 130 km, its source is in the Hautes-Alpes department, and it 
flows into the Isére River a few kilometers downstream of Grenoble. Its watershed 
is 3550 km? and its flow around 100 m?/s when it merges with the Isère River. 


16.2.1 Rainfall Data in Watersheds 


Daily rainfall at specific locations is measured with the help of rain gauges and given 
in mm (1 mm of rainfall corresponds to 1 L/m?). The watershed rainfall data used 
in this example are the output of an original interpolation method based on EDF- 
defined weather patterns [348]. The latter makes it possible to deal with the problem 
of underestimation of precipitation at altitude where rainfall capture by measuring 
devices is sometimes deficient (the presence of strong winds and a preponderance 
of heavy rain are the two main factors aggravating precipitation capture at altitude). 
For this example, we have 53 years of daily rainfall data in the watersheds of the 
Rivers Drac (before it joins with the Isére River) and Isére (at Grenoble) from 1 
January 1953 to 31 December 2005 (Fig. 16.1). This corresponds to 19,538 daily 
rainfall values for each watershed. 


16.2.2 Marginal Data Analysis 


For the statistical analysis that follows, it is necessary to render the daily rainfall 
continuous, i.e., as if coming from a continuous bivariate distribution. This is achieved 
here using jittering (cf. Sect. 5.3.4). It is customary to add a zero-mean noise to the 
data in such cases, but here the data has many 0 values corresponding to days without 
rain. So as not to introduce negative values, we add instead a uniform noise between 
0.001 and 0.002 which does not have a mean of 0 but remains negligible with respect 
to the scale of the measurements. 
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Fig. 16.1 Daily rainfalls from 1 January 1953 to 31 December 2005 in the watersheds (WTS) of 
the Drac (before it joins with the Isère River) and Isère Rivers (at Grenoble) 


The Drac and Isère watershed rainfalls are thus considered random variables 
which we, respectively, denote X; and X2, draws from which are written xı and x2. 
The latter therefore correspond to observed rainfalls in each watershed. 

The univariate analysis of each marginal distribution will be performed using 
the threshold exceedance approach (POT). We are going to look for the parameters 
of generalized Pareto distributions (GPD) that best characterize the data. Remember 
that the cumulative distribution function F (x) of a random variable above a threshold 
follows a GPD given by 


tam —1/$j 
Fy) =1-[1+8( 5 :)] , (16.1) 


with j € {1,2}, where u is the chosen threshold, o the scale parameter, and & the 
shape parameter. 

As a first step, it is necessary to choose the thresholds beyond which events will 
be considered extreme in each of the watersheds independently. Typically, experts 
consider 3—4 such events per year to be the norm, so we can heuristically set the 
thresholds to be consistent with this: 39 mm and 34 mm for the Drac and Isère 
Rivers, respectively. These thresholds imply 188 exceedances for the Drac River 
(around 3.5 per year) and 200 for the Isère River (around 3.8 per year); see Figs. 16.2 
and 16.3. 

In order to ensure independence of the events thus selected, it is necessary to check 
that the distributions of the dates at which the thresholds are exceeded follow uniform 
distributions. For this, we show in Fig. 16.4 the empirical cumulative distribution 
functions of the exceedance dates, along with that of the uniform distribution. 

We therefore decide to retain these thresholds, and now turn to estimating GPD 
distributions using maximum likelihood on the daily rainfall data. The parame- 
ters obtained are shown in Table 16.1. The QQ-plots in Fig. 16.5 help to judge the 
quality —relatively high in this case—of the fits. 
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Fig. 16.2 Daily rainfalls above marginal thresholds for the watersheds (WTS) of the Drac and the 
Isère Rivers at Grenoble 
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Fig. 16.3 Monthly distribution of the number of marginal threshold exceedances for the watersheds 
(WTS) of the Drac River and the Isère River at Grenoble 


16.3 Asymptotic Dependence Analysis 


The study of the conjunction of extreme rainfall events is of great importance when 
the intensity of such rainfalls increases simultaneously. To conduct such a study, 
we model the dependence structure between marginal distributions using a copula. 
Various tests can then be used to judge the usefulness of a multivariate statistical 
study of extreme rainfalls (see Chap. 9) including analyses of: 


e bivariate measurements in the empirical rank space; 
e the pseudo-polar coordinates associated with the measurements; 
e asymptotic independence using Falk & Michel’s test [262]. 
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Fig. 16.4 Empirical cumulative distribution functions (CDF) of the dates at which the threshold 
is exceeded in the watersheds of the Drac and the Isére Rivers at Grenoble. The uniform CDF is 
shown as a solid line 


Table 16.1 GPD parameters for the daily rainfall in the watersheds (WTS) of the Drac and Isère 
Rivers 


Parameters Drac River Isére River 
WTS WTS 

u 39 mm 34mm 

E —0.0046 —0.015 

o 12.341 10.008 


16.3.1 Analysis of Bivariate Measurements in the Rank Space 


Figure 16.6 shows rainfall in the watersheds of the Drac and Isère Rivers in the phys- 
ical space, as well as in the modified empirical rank space. The empirical cumulative 
distribution function for each is given by 


1 n 
i > Lis} (16.2) 


i=l 


with j € {1, 2}. 

The point cloud in Fig. 16.6b is thus a realization from the empirical copula of the 
sample of measurements. In this plot, we see that the rainfalls seem to be independent 
in non-extreme situations (the data appears uniformly distributed over the square 
[0; 0.5] x [0; 0.5]). On the other hand, there appears to be a certain dependence on 
higher-quantile rainfall events. To better illustrate this, Fig. 16.7 plots—again in the 
rank space—measured rainfalls (x1, x2) for high quantiles. We see that when there 
is high rainfall, it tends to occur in both watersheds simultaneously. 
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Fig. 16.5 QQ-plots of the GPD fits for the rainfall data from the watersheds of the Drac River (top) 
and the Isére River (bottom) 


By limiting the data to events where both rainfall values exceed their respective 
thresholds (thus retaining 103 events), we obtain a realization of the extreme copula 
of our sample—see Fig. 16.7c. 


16.3.2 Analysis of Pseudo-Polar Coordinates 


For this analysis we first need to transform the random variables corresponding to 
watershed rainfalls using standard Fréchet distributions: 
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Fig. 16.6 Graphical analysis of the asymptotic dependence of daily rainfall in the watersheds of 
the Drac and Isère Rivers. Pairs (x1, x2) are shown in the physical space (left) and empirical rank 
space (right) 


1 
V = jog FX) (16.3) 
with j € {1,2}. It can be shown that the pair of random variables (V1, V2) follows a 
bivariate distribution with standard Fréchet distributions as its marginals, and its cop- 
ula is that of the pair (X1, X2). The latter is ensured by the fact that the transformation 
using the standard Fréchet distribution is increasing and performed componentwise. 
The standard Fréchet distribution has the cumulative distribution function 


W(x) = exp(—1/x) Yx > 0. (16.4) 


As those for F; and F; are unknown, we can replace them by their modified empirical 
estimates F 1 and Ê». The rainfall data for the watersheds of the Rivers Isère and Drac 
are thus transformed using (16.3) into data (V1, V2). Next, we introduce pseudo-polar 
coordinates (W, R) defined by 


V 

Woe (16.5) 
Vi + V2 

REV +V. (16.6) 


Bivariate rainfall measurements, where one of the two values corresponds to a high 
quantile, lead to very large values of Vi or V2 and thus R. This variable R thus 
characterizes bivariate rainfall measurements in which at least one of the two values 
corresponds to a high quantile. 

The variable W takes values in [0; 1]. When both V; and V2 simultaneously 
correspond to high quantiles, W will be close to 0.5. However, the reverse is not 
true: the event W = 0.5 also contains all bivariate measurements (V1, V2) for which 
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(c) 
Fig. 16.7 High empirically ranked rainfalls in the watersheds of the (a) River Isère and (b) Drac 


River; (c) shows the empirical rank space where both rainfall values exceed their respective thresh- 
olds 


the values of V; and V> correspond to the same quantile, whether it be extreme or 
not. Furthermore, when the value of V, corresponds to a high quantile and that of Vz 
does not, W will be close to 1. Inversely, when the value of V2 corresponds to a high 
quantile and that of V; does not, W will be close to 0 (Fig. 16.8). 

The analysis of asymptotic dependence is performed on extreme-valued bivari- 
ate measurements, in the sense that at least one of V1 and V2 corresponds to a 
high quantile. These measurements are found in the top-right quarter of Fig. 16.8. 
The thresholds (11, 42) were chosen using univariate analyses of the distributions 
exceeding the marginal thresholds (see Table 16.1). 

Figure 16.9 shows a histogram of the values of W corresponding to the extreme- 
valued measurements associated with these thresholds. We see that there is no strong 
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peak close to 0 or 1 and that the density around W = 0.5 is far from negligible. 
This suggests that there is indeed dependence between the marginal extreme values. 
This analysis of the histogram of values of W thus appears to confirm that the 
measurements are asymptotically dependent. 
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16.3.3 Falk and Michel’s Independence Test 


Falk and Michel’s test [262] transforms the data into realizations of marginal Weibull 
distributions: 


Zj = log F;(X;), (16.7) 


with j € {1, 2}. These authors show that Z; and Z are asymptotically independent if 
and only if the conditional distribution of Z; + Z2|Z, + Z2 > c has the cumulative 
distribution function F(t) = t? when c tends to zero from below. A statistical inde- 
pendence test can then be constructed based on a Kolmogorov—Smirnov metric (cf. 
Example 4.4). Here, the p-value of this test gives 10715, which allows us to strongly 
reject the hypothesis Ho of asymptotic independence between Z; and Z2. 


16.3.4 First Conclusions 


The above analyses confirm the absence of asymptotic independence between the 
rainfall measured in the watersheds of the Drac and Isére Rivers at Grenoble, which 
means that the theory of generalized extreme values can be used to model the tail of 
the joint rainfall distribution: 


_ log Fy (xı) 
F (x1, X2) = exp (108 [r (x1) : À (om Gans =) l) (16.8) 
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for any x; > u1 and x2 > u2, where Pickands’ function A(.), seen in Chap.9 and 
discussed further in Sect. 16.4.1, characterizes the dependence structure. The distri- 
bution in (16.8) can be approximated, in practice, by 


log Ê (xı) 


F (x1, x2) = exp | log 4 Fi (x1) - À - z 
log [Â Cu) - Ê ()] 


, (16.9) 


where Ê; and F are approximations to the tails of the marginal distributions, associ- 
ated with the thresholds u; and u2 and modeled by generalized Pareto distributions. 
In our approach, these are considered the best possible estimations: 


7 x= Us —1/xi; 
É;G@)=1-—À; [+s = +) (16.10) 


j 


with j € {1, 2} and 


À = P(X) > u1) ~ 1 — 0.9896, 
Àa = P (X2 > uz) ~ 1 — 0.9903. 


? 


16.4 Dependence Structure 


Two strategies for fitting the parameters of the bivariate distribution can be considered 
when studying the dependence between two dependent variables. The first consists of 
estimating together—in the same maximum likelihood procedure—the parameters 
of the marginal distributions and the dependence function. A model that results from 
this is called a total parametric model. The second strategy consists of estimating 
them separately, leading to a partial parametric model. 

Here, we follow the second approach, which allows us to choose marginal distri- 
butions consistent with our own criteria and the aims of the study. 


16.4.1 Pickands’ Dependence Function 


Pickands’ function A is connected to the dependence function J via the equation: 


Vy 
1 (V2, Vi) = (V: V1) À ; 16.11 
(V2, Vi) = (V2 + Vi) (7) ( ) 
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where (V2, Vi) are given by (16.3). Formula (16.8) can then be rewritten as a function 
of the variables V2 and V., and the dependence function Z: 


F (x2, x1) = exp (ioe {et (16.12) 


for any x2 > uz and x; > u1. Estimation of the dependence function is performed 
parametrically, starting by choosing an adequate model. Here, we propose to use the 
logistic model studied already in Chap. 9, defined as 


1 (V2, Vi) = (v + vi”) , with0 <a < 1. (16.13) 


The parameter «œ represents the strength of the dependence between extreme rainfall 
events in the watersheds of the Isère and Drac Rivers at Grenoble. In particular, full 
dependence corresponds to a — 0, and independence to œ = 1. 


16.4.2 Goodness-of-Fit Test 


The test used here, developed by Genest et al. [327], allows us to reject (or not) 
the hypothesis that Pickands’ function belongs to a given parametric family, e.g., 
Clayton, Gumbel—Hoougaard, Franck, etc. It uses the Cramér—von Mises criterion 
Sn, measuring the distance between a theoretical Pickands function A(t) and the 
empirical estimate A*(t) of the Pickands function of a sample [270]. The latter, as 
well as the parameters of the Pickands estimator, are estimated using the sample of 
paired data from the two watersheds: 


{Ga X11), ees (X2n; itn) $ 


The Cramér—von Mises criterion is written as 
1 
S= Í nllAn(t) — A$ ®©) dt. 
0 


After an initial estimation of the empirical and parametric Pickands functions— 
respectively, denoted A° (t) and AMG) (conducted using the (xi, x1;);), and an 
initial calculation of the Cramér—von Mises criterion (denoted 5°), the statistical 
distribution of the latter can be evaluated using parametric bootstrap procedure 
[244], summarized as follows. 
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For sufficiently large N, and k € [1; N]: 

1. generate a new sample (Ci 1.) using the estimated parameters of the 
copula A*&D (+), 

2. calculate A*(r) and A**(r) using this new sample, 

3. calculate S*. 


The corresponding p-value can be estimated using the probability that the Cramér— 
von Mises criterion corresponding to the observed data is smaller than that of the 
bootstrap samples, i.e., 


N 
1 

— value = — > 1 ky. 

P N = {Sa <SK} 


Here, we have used the logistic extreme copula (16.13). The p-value associated 
with this in the goodness-of-fit test is 0.0004995. As this value is very small, we 
consider that the logistic copula is a reasonable approximation to the dependence 
between extreme rainfall events in the Drac and Isére watersheds at Grenoble. 


16.4.3 Choosing the Final Model 


After having confirmed a lack of asymptotic independence between rainfall data 
pairs, we have chosen to model the tail of the joint rainfall distribution using formula 
(16.9). The latter is based on the marginal distributions of the observed data in each 
watershed (16.10) and the logistic dependence function (16.13). The parameters 
obtained for the marginal distributions are shown in Table 16.1, and the parameter a 
of the logistic distribution is estimated as 0.605 using maximum likelihood. 


16.5 Model Use 


16.5.1 The Tail of the Joint Distribution 


Recall that the rainfalls in the watersheds of the Drac and Isére Rivers are represented 
as a bivariate random variable (X1, X2). Given the bivariate rainfall modeling justi- 
fied in the preceding paragraphs, (X1, X2) has the cumulative distribution function 
F (x1, x2) given in (16.9). This modeling is valid only in the extreme region: 


A ={(X1, Xo) | X1 > 39mm; X» > 34mm}. (16.14) 
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Fig. 16.10 Iso-quantile 
contours of the tail of the 
bivariate rainfall distribution 
in the Drac and Isère Rivers 
watersheds. Quantiles 
corresponding to return 
periods of 10, 50, 100, 500, 
and 1000 years are shown 


Isère rainfall (mm) 


Drac rainfall (mm) 


Under F(x1, x2), the extreme region A is attained with probability 
P(A) = Ai + A2 — [1 — F (u1, u2)]. (16.15) 


For the parameters in our study, this gives P(A) ~ 3, 21.107. 
Figure 16.10 plots the iso-quantile contours, i.e., the set of bivariate quantiles of 
the same order p. 


16.5.2 Rainfall Distribution Conditional on the Conjunction 
of Extreme Values 


We define the bivariate random variable Y as follows: 
Y = (Y, Y2) = X|K EA. (16.16) 


We would like to obtain the cumulative distribution function of this variable. That is, 
we wish to define the bivariate distribution of the rainfall in the two watersheds, given 
that both are above their univariate thresholds: (u1, u2) (Fig. 16.11). The cumulative 
distribution function of Y is given by 


F, — F — F F, 
Pina x(x1, X2) x(u, ee u2) + Fy(u;, ua) (16.17) 


and its probability density by 
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Fig. 16.11 Joint cumulative distribution function (top) and probability density (bottom) of Drac 


and Isére watersheds (WTS) rainfalls, at Grenoble in the region A, conditional on being in an 
extreme-valued situation 


O° Fy(x1, x2) _ px(%1, x2) 


= 16.18 
02x14 - Xo P(A) ( ) 


fyi, x2) = 


16.5.3 Marginal Rainfall Distribution Conditional on the 
Conjunction of Extreme Rainfalls 


Using Eq. (16.17), it is possible to calculate the distribution of extreme rainfall in 
one watershed, given the conjunction of extreme rainfalls in both (Figs. 16.12 and 
16.13). In other words, we can calculate the distribution of extreme values Y; in the 
watershed of the Drac River (resp. Y2 for the Isère River ) given that there is strong 
rainfall in both (above the thresholds defined in the univariate analyses). 
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Fig. 16.12 Cumulative distribution function (CDF; left) and probability density function (PDF; 
right) of rainfall for the watershed of the Drac River, given the conjunction of extreme rainfalls in 
both watersheds 


The corresponding cumulative distribution functions are given by 


Fy (x2) — Êi (u2) — Fx (xo, u1) + Fx (u2, u1) 


Fy, (x2) = A 
` n (A) (16.19) 
BG) — F(u) — Fx(u2, x1) + Fx(u2, u1) 
Fy (x1) = PCA) l 


As for the probability densities, they are given by 


OF 
Px,(%2) — gg em) 
Py, (x2) = (A) ; 
oF 
Px, (xı) — es (u2, x1) 
a 1 
Py, (xı) == P(A) 


16.5.4 Conditional Distribution of the Rainfall in One 
Watershed 


It is of interest to know the rainfall distribution in one of the watersheds in the event 
of a conjunction of extreme rainfalls in both, given the value of the rainfall in the 
other (Figs. 16.14 and 16.15). 

Thus, the rainfall distribution in the watershed of the Drac River (resp. Isére) in 
the event of a conjunction of extreme rainfalls in both, given the value of the rainfall 
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Fig. 16.13 Cumulative distribution function (CDF; left) and probability density function (PDF; 
right) of rainfall for the watershed of the Isére River, given the conjunction of extreme rainfalls in 
both watersheds 
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Fig. 16.14 Rainfall probability density function (PDF) in the watershed of the Drac River given a 
conjunction of extreme rainfalls in both watersheds and the value of the rainfall in the watershed of 
the Isére Rive (left) 50 mm, (right) 80 mm 


in the watershed of the Isère River (rain;;) (resp. Drac—rainp,), is given by the 
new random variable Z2 (resp. Z1). These variables can be formally written as 


Z2 = if = rainp;} = Xi {X2 > u2, Xı = rainp, > uj}, 
(16.20) 
Z = Y2|{Yo = rain;;} = X2|{X2 = rain], > u, Xi > ui}. 
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Fig. 16.15 Rainfall probability density function (PDF) in the watershed of the Isère River given a 
conjunction of extreme rainfalls in both watersheds and the value of the rainfall in the watershed of 
the Drac River: (left) 50 mm, (right) 80 mm 


The probability densities pz, of Z2 and pz, of Z\ are then given by 


Py(X2, rainp;) 


Pz x2) = - 
Py, (rainpr) 


(16.21) 
py(rainp,, xı) 


pz(xı) = : 
prn (rainp,) 


Chapter 17 A) 
Conjunction of a Flood and a Storm gas 


Alain Sibler and Anne Dutfoy 


Abstract The case study in this chapter involves the conjunction of two climate 
hazards: a flood (high river flow) and a storm (strong wind), which we label Flow and 
Wind Speed for simplicity. A combination of the two is likely to jeopardize industrial 
installations. The aim of the study is to calculate the probability of the conjunction 
of these extreme events and produce daily probability and annual frequency plots of 
such events. 


17.1 Introduction 


In this chapter, we focus on the bivariate analysis of the extreme values of the Flow 
and Wind Speed processes, after a brief initial summary of results from univariate 
marginal analyses. Data for the study comes from the Meuse department of the Grand- 
Est region of France. For more in-depth details on how the following univariate 
results were obtained, readers should consult the previous chapters. The analysis is 
conducted according to the following steps: 


1. Univariate analysis of seasonality and trends in the flow and wind speed. 

2. Bivariate analysis of the correlation in extreme values of the two processes. 

3. Estimating the daily probability and annual frequency of the cumulative level 
exceedance for given return periods; annual frequency is either defined in terms 
of the mean number of co-occurring events per year or the mean number of co- 
occurring episodes per year (see Sect. 6.3.2). 

4. Plotting daily iso-probability and annual iso-frequency contours in the space of 
marginal return periods. 


The data consist of joint measurements of the daily mean flow and daily maximal 
instantaneous wind speed, and we analyze them using the R packages evir, evd, 
and texmex. 
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Fig. 17.1 Daily wind speeds (1981-2012) in the Meuse department of the Grand-Est region of 
France 


17.2 Summary of Univariate Analyses 


17.2.1 Available Data 


In this case study, storms are represented in terms of daily maximal instantaneous 
wind speed. The data are from 1 January 1981 to 22 May 2012 and was recorded 
by Météo-France (Fig. 17.1). Though earlier data is available, we have not used it 
here since it is only considered homogeneous from 1981. Flooding is represented 
in terms of high daily water flow. These data were recorded for the Meuse river 
between 1 January 1953 to 31 December 2013 and are available in the HYDRO 
databank (Fig. 17.2). 


17.2.2 Seasonality 


It can be shown that, by restricting the study to the December—February period, the 
flow can be considered stationary. In addition, this period is the one which contains 
the extreme values we wish to analyze. Similarly, a study of the wind speed data leads 
to retaining the October-March period as the one which contains the highest wind 
speeds. The period in which both are stationary is therefore December—February. 
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Fig. 17.2 Daily mean flow of the Meuse river (1953-2013) 


17.2.3 POT Approach and GPD Models 


We have analyzed and then modeled the behavior of both of the marginal processes 
in their extreme value regions, exactly as in previous chapters. For each process, a 
threshold exceedance approach (POT) has been retained for modeling the tails of the 
marginal distribution process. 

To proceed in this way, we must extract from the data the sequence of independent 
cluster maxima, where each cluster contains several successive values above a high 
threshold. The latter must be chosen in such a way that the sequence of cluster 
maxima follows a homogeneous Poisson process, i.e., has the following properties: 


1. the cluster maxima (one for each occurrence of a flood) are independent, 
2. the distribution of the time between successive cluster maxima is exponential, 
3. the annual number of cluster maxima follows a Poisson distribution. 


Remember that once thresholds have been chosen, stationarity in the extreme val- 
ues of the processes can be studied using the sequences of cluster maxima displayed 
graphically alongside statistical tests (Sect. 5.4.3.2), of which the most important are 
the Mann-Kendall test—which checks for the presence or absence of a trend—and 
the uniform distribution test on the set of dates at which cluster maxima occur. Other 
tools are also commonly used for validating chosen thresholds, using for instance 
the fact that the excesses of the cluster maxima over the thresholds should have GPD 
distributions. More details can be found in the previous chapters. 

Table 17.1 gives the parameter estimates for the GPD model of the flow process, 
along with confidence intervals estimated using the delta method (Theorem 4.4), 
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Table 17.1 Parameters of a generalized Pareto distribution fitted to the cluster maxima exceedances 
over the threshold qs = 620 m?.s~! of the flow of the Meuse river, along with 95% confidence 
intervals 


Standard 
deviation 


Dsup 


denoted by [djing , Dsup]. The threshold has been set at qs = 620 m?.s—!. As for the 
study of the wind speed time series, the threshold has been set at v, = 26.9 m.s—!, 
which leads to the following parameters for the GPD model of the cluster maxima 
exceedances: £ = 0 ando = 3.23. 


17.3 Bivariate Analysis of Flows and Wind Speeds 


We now turn to study the dependence between the extreme values of the Meuse 
river’s flow and the wind speed. This bivariate analysis takes place over the period 
in which both processes are stationary, i.e., daily measurements from December to 
February between 1 January 1981 and 22 May 2012. 


17.3.1 Bivariate Data 


Figure 17.3 shows paired measurements over the period of the study, first in the 
physical space (a), then in the rank space (b) using the empirical ranks. Figure 17.4 
shows regions of these spaces in which at least one component corresponds to a 90% 
quantile or above. Figure 17.5 shows pairs for which both components are over their 
90% quantiles. Lastly, Fig. 17.6 shows these measurements in the Fréchet space. 
These plots seem to indicate asymptotic independence of the flow and wind speed 
in extreme value settings, a result which will be confirmed by calculations in the 
following section. 


17.3.2 Asymptotic Dependence Analysis 


Figure 17.7 displays the functions x (u) and x(u). Their analysis helps to estimate 
the asymptotic behavior of the vector of random flow and wind speed pairs (see 
Sect. 9.2.3). Recall that for two correlated hazards X; and X2, with marginal distri- 
butions Fy, and Fy, and copula Cr, 
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Fig. 17.3 Bivariate data points of the flow and wind speed between the months of December and 
February over the period 1981-2012; a physical space and b rank space 
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Fig. 17.4 Bivariate data points of the flow and wind speed between the months of December and 


February over the period 1981-2012; a high flow and b high wind speed 
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Fig. 17.5 Bivariate data points of the flow and wind speed between the months of December and 
February over the period 1981-2012; only data for which both components have high ranks are 
shown 
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Fig. 17.6 Bivariate data points of the flow and wind speed between the months of December and 
February over the period 1981-2012; Fréchet space representation 
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Fig. 17.7 Plots of the functions u +> x (u) and u > x(u) for the bivariate flow and wind speed 
measurements 


x = lim x(u), 
u>1- 


x = lim x (u), 
u— 1 


where 
log PFy,(X1) < u, Fo(X2) < u 
x(u) =2 i 
ogPFy, (X1) <u 
25 log Cr(u, u) 
= logu 
and 


2log PF\(X 1) >u 


xu) = 
log PF, (X1) > u, Fo(X2) > u 
2 2log(1 — u) 1 
~ log Cr(u, u) 


We see that x(u) tends to 0 when we tend toward the order 1 quantile, while 
X (u) remains above 0; these two together suggest positively associated asymptotic 
independence, i.e., random events exceeding each extreme threshold tend to favor 
each other, with this dependence decreasing as the thresholds become more and 
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Fig. 17.8 The value of 7 in 
the Ledford and Tawn model 
as a function of the threshold 
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more extreme. The strength of the positive association is given by the limit value 
of x (u) as we tend toward the order 1 quantile; here we estimate this as x = 0.26. 
Ledford and Tawn’s test, run on the scalar variable T = min(Z;, Z2) (where Z; = 
—1/ log(F; (X;))), with X; corresponding to measures of the Meuse river’s flow and 
X to the wind speed, and F; the cumulative distribution function of X;, allows us 
to calculate the tail dependence coefficient n, which characterizes extreme values 
of the variable T (cf. Sect.9.3.4). 

Figure 17.8 plots the estimated value of 7 as a function of the threshold chosen for 
the GPD modeling of T. We obtain n ~ 0.63. As we know that x = 2n — 1, our two 
estimated values appear to be coherent with each other. Furthermore, the confidence 
intervals for 7 do not contain the value 1, meaning that the asymptotic dependence 
hypothesis can be rejected. 

Finally, Fig. 17.9 shows the histogram of a nonparametric reconstruction of the 
spectral density of extreme-valued measurements. In the pseudo-polar space (R = 
Zı + Z2, W = Vi/R), where (Z1, Z2) are the coordinates of the measurements in 
the Fréchet space (cf. Sect.9.2.2.4), we plot the histogram of the W for the large 
values of R (those above their 95% quantiles). The “hollow belly” of this histogram 
is supporting evidence for asymptotic independence. 


17.3.3 Physically Based Validation of the Statistical Results 


In this section, we propose a physical process-based argument, provided by EDF 
hydrological experts, to validate the observed positively associated asymptotic inde- 
pendence that was statistically estimated in the present study, between the flow of 
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Fig. 17.9 Spectral density of W in the pseudo-polar space of extreme-valued bivariate measure- 
ments of the flow and wind speed 


the Meuse river and the wind speed. Statistical links between floods and wind speed 
at a given site of interest are indeed plausible. In fact, rains which are at the origin of 
the most extreme flood events are a result—like the wind—of atmospheric dynam- 
ics, in particular the circulation behavior of forced air masses at the synoptic scale! 
induced by the configuration of the pressure field. Several factors nevertheless make 
it possible to qualify the quality of the expected statistical relationship and ultimately 
lead to a low probability of the conjunction of these two types of extreme event. 
Spatial shifts. In the case of the wind speed, it is its value at the site of interest which 
is important to us, whereas for the rain, it is its cumulative value over the watershed 
above the site of interest. This introduces a shift proportional to the length of the 
watercourse. However, the wavelength of pressure anomalies is often greater than 
2000km, which limits the consequences of this spatial shift for typical watersheds. 
Time lags. The flood peak value at the watershed outlet* is delayed (with respect 
to the rain that caused it) by the time it takes for the water to flow to the point of 
interest. Such delays depend—among other things—on the area of the watershed. In 
the case of large ones (which is more or less the case for the watersheds of France’s 
largest rivers), the lag between the rainfall and maximal flow will get close to or 
exceed typical average autocorrelation time periods of the wind speed variable. 
Determinants of extreme rainfall. Unlike typical wind storms, which behave rel- 
atively deterministically with respect to an atmospheric pressure field, and more 
precisely its horizontal gradient, the causes of rainfall are more complex. Even in 
large watersheds, it is not only the strength of the flow that counts but also its mois- 
ture load, along with all of the causes of ascending air masses (low-level horizontal 
convergence, forcings due to relief, advection, convection, upper troposphere diver- 
gence, absolute or potential vorticity advection, etc.). There exist statistical links of 
various strengths between these, but it must be remembered that even in the absence 


' Meteorological and oceanographic term for phenomena occurring at the planetary scale. 
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of the spatial or temporal shifts mentioned above, the correlation between rainfall 
and wind speed is far from perfect and—in addition—differs from one watershed to 
the next. 

Although the watershed of the Meuse river—in contrast with those of rivers orig- 
inating in France’s highest mountains—is often considered a floodplain, its orogra- 
phy” still plays an important role in rainfall analyses. Indeed, the mountainous reliefs 
of the Argonne (upstream of the Meuse river) and Ardennes (right bank tributaries: 
Chiers and Semoy rivers) are opposed to the very dominant westerly flows at this lati- 
tude, despite their moderate altitude. In the case of strong ocean winds, the orography 
increase in rainfall is exacerbated. However, ocean storms are often associated with 
a sustained strengthening of the mean jet stream? associated with a positive phase 
of the North Atlantic Oscillation (NAO). In the cold season in particular, repeated 
rainy episodes induced by high winds coming off the Atlantic lead to waterlogged 
land and subsequently to extreme high levels of the Meuse river, such as those which 
occurred at the end of January 1995. At the flood peak, an ongoing storm may still 
be generating strong or even extreme north to north-west winds. While nondeter- 
ministic, a statistical link between wind and flood seems to characterize the nature 
of these natural hazards. 


17.4 Annual Probabilities and Frequencies 


Our aim is to calculate the probability of the conjunction of the two extreme hazards, 
each given in terms of the exceedance of high thresholds. These marginal thresholds 
are defined in terms of return periods. For reasons of confidentiality, the return peri- 
ods corresponding to the thresholds used in this chapter are not shown. However, 
Fig. 17.10 plots observed data pairs of the flow of the Meuse river and the wind speed 
from 1981-2012 during the season being studied. The shaded area corresponds to 
the zone for the thresholds used here. 

Once the probabilities of such simultaneous events have been estimated, the annual 
frequencies can be calculated, and it is then possible to plot annual iso-probability and 
iso-frequency contours in the annual return period space (Np, Ny) (corresponding 
to the Flow and Wind Speed processes). 


? The domain of physical geography involving the formation and features of mountains. 


3 Rapid and concentrated air currents found between the troposphere (where the temperature 
decreases with altitude) and the stratosphere (where the temperature increases with altitude). 
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Fig. 17.10 Bivariate data for the flow and wind speed. The shaded zone corresponds to the area of 
interest for the study 


17.4.1 Probability of Simultaneous Extreme Events 


In Chap.9, estimators for the probability of the conjunction of extreme events are 
presented in detail; we invite the reader to refer back to that chapter for the technical 
details of the methodology led in the following. 


17.4.1.1 Estimation of the Threshold Probability p, 


We use four different estimators of the parameter 7 of the Ledford and Tawn model 
(cf. Sect. 9.3.4): 


e Hill’s estimator re directly determined using the data given as T; 

e the maximum likelihood estimator fme obtained by fitting a GPD distribution to 
T; 

e Hill’s estimator np determined using the values of T obtained by sampling from 
the GPD distribution; 

e the maximum likelihood estimator ”,, p obtained by fitting a GPD distribution to 
the values of T obtained by sampling from the GPD distribution. 


In Fig. 17.11, the four estimators of 7 are plotted as a function of the threshold 
probability p,. The final value we select for the threshold probability is the one at 
which the distance between the four estimators is the smallest, over the stable part 
of the plot, i.e., p, = 0.85 here. 
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Fig. 17.11 Estimates of the ° 
parameter 7 of the Ledford i 
and Tawn model as a 

function of the threshold s 
probability p} S 


0.80 0.85 0.90 0.95 


17.4.1.2 Estimating p 


We have available to us the four estimators for the probability of the conjunction 
of extreme hazards described in Sect. 9.4.3.4: Pe, Pp, Pr, and p,. The first three of 
these are given in terms of the parameter 7 of the Ledford and Tawn model itself 
calculated with the help of the four estimators defined above. We have therefore a 
total of 13 useable statistical estimators. 

For each conjunction of extreme events, defined by the respective thresholds for 
the flow of the Meuse river and the wind speed, with return periods (Np, Ny), 
we plot the functions of the estimators in terms of a parameter which must be set: 
the thresholding parameter po, which is involved in all of the estimators. All of 
the estimates of the probability of both hazards exceeding their thresholds can thus 
be plotted on the same graph for comparison, as a function of the same threshold 
probability po, ranging across the extreme quantiles 0.8-0.95. The threshold po 
corresponds to minimizing the spread between the various estimators across the 
considered range of values of po in which all of the estimators are well-defined (1.e., 
non-zero). 


Remark 17.1 For each return periods pair (Np, Ny), the estimator p, requires the 
calculation of the extremal index 0 of the random variable T, (not shown here). 


We will not provide here all of the plots that have been used to compare estima- 
tors for each conjunction of extreme events, but only the conclusions drawn from 
analyzing them. We have chosen not to retain the estimator p, when calculating the 
probability of simultaneous threshold exceedance, preferring to use the estimator 
Pr, which is calculated with the help of the estimator 7,- of n in the Ledford and 
Tawn model. 
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Fig. 17.12 Maps of the daily probability of the conjunction of extreme events: a iso-probability 
contours estimated using the estimators pr — he, and b zones containing probabilities of the same 
order of magnitude 


Remark 17.2 A bivariate extreme-valued event means that both marginal extreme 
events have occurred on the same day, during the overlapping seasonal period 
(December—February). 


Probability maps, as in Fig. 17.12, help to delimit boundaries between zones con- 
taining probabilities with the same order of magnitude. Figure 17.12 shows two ways 
to visualize the probability of the conjunction of extreme events. Given the scarcity 
of available data, it is not possible to give a very detailed view of the most extreme 
levels. 


17.4.2 Annual Frequencies 


The period of stationarity for both high flow and high wind speed processes is from 
December-February, containing a total of no = 90 days. Let us now consider events 
corresponding to high threshold exceedance, denoted by & = {D > d, V > v}, 
whose probabilities of occurring—across a range of extreme thresholds (d, v)— 
were calculated in the previous section. The annual frequency of an event can be 
defined in several ways, as described in Sect. 6.3.2. We illustrate here the behavior 
of the two main definitions of it in our example. 

The annual frequency, defined as the mean number of days per year when the 
event E occurs, is written as follows: 


F°'3(2) = nP (8). 
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Fig. 17.13 Maps of the annual frequencies of the conjunction of extreme events (mean number 
of days per year): a annual iso-frequency contours estimated using the estimators pr — he, and b 
zones containing annual frequencies of the same order of magnitude 


The annual frequency, defined as the mean number of episodes per year of the event 
E, is given by 


FE) = no &p,v) P(E), 


where 4(p, y) is the multivariate extremal index of D and V, defined as the inverse of 
the mean duration (length) of independent clusters corresponding to simultaneous 
exceedance of the threshold q, for the flow of the Meuse river and v, for the wind 
speed, during the overlapping time period and season studied in the joint analysis. 
See Sect. 9.3.4 for more details. 

The independence of simultaneous exceedance clusters is regulated by a redescent 
criterion defined as the maximum of the redescent criteria of the two variables: 
max(rp, ry) = 5 days. We then find 


Ap,v) = 0.36, 


i.e., a mean length of clusters of simultaneous exceedances of L(p,y) of around 3 
days. 

These values make it possible to draw boundaries between zones of frequencies 
of a similar order of magnitude. Figures 17.13 and 17.14 plot the frequencies of the 
conjunction of extreme hazards in the ranges considered earlier. Given the scarcity of 
data, it is again not possible to give a very detailed view of the most extreme levels. 
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Fig. 17.14 Maps of the annual frequencies of the conjunction of extreme events (mean number of 
episodes per year): a annual iso-frequency contours estimated using the estimators pr — he, and b 
zones containing annual frequencies of the same order of magnitude 


17.5 Prospects 


A study of wind speeds and river flows can be built upon with the help of other data 
and analyses. Clearly, as for any study of extreme values, the more data we have, the 
more precise the results can be; it would therefore be of interest to repeat the study 
in several years, based on longer time series. Two other ways to move forward are 
given below. 

First, it would be possible to model the tail of the bivariate distribution of flows and 
wind speeds with the help of the Ledford and Tawn model. This bivariate distribution 
could then be used to model conditional scenarios and help answer the question: 
When the Meuse river floods, what is the likely distribution of wind speeds? The 
same question can be asked of the risk of flooding given a high wind speed. 

Second, it is possible to define an iso-quantile curve corresponding to all combi- 
nations of flows and wind speeds having the same criticality. This criticality would 
then allow us to define the return periods proposed by Kendall [328, 329, 334]. From 
this iso-quantile curve, it would then be possible to choose a specific combination 
to then be used as a starting point for robustness analyses of installations potentially 
affected by the joint appearance of flooding and storm conditions. This choice could 
be based on probabilistic considerations (e.g., the most likely to occur combina- 
tion) or practical ones (e.g., the combination most dangerous to the installation in 
question). 


Chapter 18 ®) 
SCHADEX: An Alternative to Extreme gt: 


Value Statistics in Hydrology 


Emmanuel Paquet 


Abstract This chapter presents, in detail, a methodology used by EDF in many 
hydrological studies, which offers an alternative to the use of extreme value distri- 
butions. Based on multi-exponential weather patterns and hydrological models, this 
approach has proven to be robust and demonstrates that the response provided by 
statistical theory is not necessarily the most appropriate in practice. The existence of 
this approach demonstrates the need to continue the research of suitable statistical 
models, and the comparison of real data with models derived from theory. 


18.1 Introduction 


18.1.1 Objectives 


The SCHADEX!(Climate-Hydrological Simulation for the Assessment of Extreme 
Flows) method [587] is a probabilistic method for extreme flood estimation at the 
outlet of a watershed. It was developed by EDF and has become since 2007 the 
reference method for dimensioning EDF’s dam spillway*. During 2006-2016, it has 
been used on around 200 watersheds in France and the rest of the world (notably, 
Norway, Canada, and Central Europe). 

SCHADEX is built around two models: 


e aprobabilistic Multi-Exponential Weather Pattern (MEWP) model which involves 
a combination of exponential distributions fitted to samples of rainfall measure- 
ments grouped by weather pattern. Its cumulative distribution function is of the 
following form: 


lIn French: Simulation Climato-Hydrologique pour l’Appréciation des Débits Extrêmes. 
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F(x) = J pi {1 — exp(—Aix)}, (18.1) 


ieT, 


where 7, is the set of weather patterns, À; a scale parameter, and p; a weight reflect- 
ing the relative frequency of weather pattern i during a given period (typically, a 
season); 

a rainfall-runoff stochastic simulation process based on the MORDOR hydrolog- 
ical model [306] which allows rainfall episodes of any strength to be simulated 
under the whole range of possible water status of a watershed. 


This method provides an estimate of the entire distribution (up to a return time 
of 10,000 years) of mean flows (for a relevant time step for the dynamics of the 
considered watershed), as well as that of peak flow*. 


18.1.2 History 


The estimation of extreme flows arriving at EDF’s dams has been carried out since the 
1960s using the GRADEX method, itself developed at EDF [2, 362]. The principle 
of the method is to deduce, using calculations based on the Gumbel-like behavior 
of the output of a simplified physical model (cf. Sect. 10.3), the asymptotics of the 
distribution of the water volume of rare floods with the help of a distribution modeling 
extreme rainfall. 

While remaining faithful to its underlying principles, the GRADEX method 
has significantly evolved over the years, in particular, due to improved computing 
resources for dealing with rainfall measurements. It has become one of the go-to 
methods worldwide for dimensioning dams with regard to hydrological risks [2]. 
However, its industrial application has really been implemented by EDF and other 
major French engineering firms. 

Starting in the mid-1990s, the method was regularly put into question by the 
hydrological community and reviewers of extreme flood studies, for the following 
reasons in particular. 


The return times associated with intense rainfall episodes in France by the proba- 
bilistic GRADEX rainfall model sometimes seemed very overestimated. 

The use of the exponential distribution for modeling extreme rainfall was quite 
controversial, in particular by statisticians, who thought it was better to use models 
with heavier tails coming from extreme value theory. 

When extrapolating flows, the assumption of a sudden switch from the distribution 
of annual maximal debits to that of extreme rainfall (above a return time of 10-50 
years, depending on the watershed) appeared simplistic and difficult to justify from 
one study to the next. 

The difficulty of taking into account effects related to snow (rain-on-snow abate- 
ment, water sheet resulting from snowmelt). 
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These observations then led, in the early 2000s, to a detailed study of the met- 
hods used to fit exponential distributions, especially in the Pyrenees mountains for 
which the extreme rainfall quantiles were the most underestimated. Starting in 2005, 
subsampling with respect to weather patterns was introduced to try and form more 
homogeneous rainfall samples, leading to the MEWP model—a composite distribu- 
tion of exponential distributions based on this sampling [303]. This model was then 
statistically analyzed at the French scale [304]. 

In parallel, a stochastic process for rainfall-runoff simulation was designed to 
cross-reference rainfall hazards and water status in a quasi-exhaustive way. It was 
based on the MORDOR hydrological model (which includes the effect of snow in 
mountainous watersheds), the use of which in studies and operational predictions 
became the norm in EDF’s General Technical Division (DTG) in the early 2000s. 
The first simulations that allowed to get to the whole distributions of flows, including 
extreme quantiles, were run in 2005. 

The SCHADEX method was presented to the public in 2006 at a conference 
organized by the France Hydrotechnical society, and the first real-world studies were 
published the same year. Since the end of 2007, it has been EDF’s go-to method for 
dimensioning dams with respect to flood risk. Since that date, the majority of extreme 
flood studies related to EDF’s dams, and those submitted to control authorities, have 
been carried out using this method. 

Since its initial development, the method has been the subject of intensive scien- 
tific work, notable in three Ph.D. theses supervised by EDF: methods, applications, 
and sensitivity studies [302], an evaluation of SCHADEX in the non-stationary 
setting [123], and estimation at sites where measurements were not taken [606]. 
SCHADEX was tested and compared with other industry methods in the French 
project EXTRAFLO [453] and the European FloodFreq COST action [505]. It has 
been applied in numerous settings outside France across a range of collaborations. 
Important work has been carried out in Norway to evaluate the method [99, 457], 
with the aim of eventually merging it into reference methods used in dam safety 
analyses. 


18.2 Methodology 


18.2.1 General Information 


SCHADEX is a probabilistic method that provides estimates of flow distributions 
(means and peaks) at an outlet* of a watershed. These distributions are constructed 
right out to high return times (10,000 years) using a stochastic process for rainfall— 
runoff simulation to simulate the response of a watershed to all types of rainfall (from 
normal to extreme), starting from a full range of initial hydrological conditions (soil 
saturation, snowpack, etc.). Stochastic simulation methods used to evaluate extreme 
flows are typically divided into two categories [105]: 


412 E. Paquet 


Event simulation: the hydrological model is fed using rainfall scenarios of refer- 
ence, generally using return times expected for extreme flows. The watershed’s 
initial conditions (as given in the model) are either constants or given in a statistical 
manner. A stochastic process is then used to model the combination of the two 
hazards. 

Continuous simulation: the hydrological model is fed with a continuous rainfall 
time series from the past or generated stochastically. A statistical analysis of the 
calculated flows induced by the model is then performed. 


SCHADEX is a hybrid simulation method described as “semi-continuous”. Syn- 
thetic rainfall events are independently added to historical rainfall-temperature time 
series (or climate chronicle or record). 

Events are simulated independently, one by one, where the hydrological con- 
ditions of the watershed are updated based on the climate record up to where the 
simulated event was added. The use of the climate record makes it possible to take 
into account the distribution of the watershed’s hydrological states well, given that 
they are involved in the simulation process with their true probabilities—obtained 
directly from the climate record—in particular, conditioned by rainfall occurring 
before the event. 


18.2.2 Main Methodological Details 


18.2.2.1 Time Step Used 


The time step—typically a value between 4 and 24h—defines the interval between 
times at which the data (flow, rainfall, and temperature) are combined, probabilis- 
tic and hydrological models are defined, and most calculations are performed. This 
time step needs to be close to the baseline time of typical floods observed within 
the watershed. It can be deduced by analyzing the hydrograph* of the largest floods 
observed in the watershed, or a regional analysis if such data is not available. Some- 
times, only daily measurements are available and we are forced to work with these 
for a watershed that would be better-modeled using a shorter time step; this may lead 
to a loss of robustness, notably when calculating peak flows. 

To simplify what follows, we will suppose that the time step of the study is a daily 
one. 


18.2.2.2 The Shape Parameter of a Flood 


The shape parameter of a flood is the ratio between its peak flow (the maximal 
pointwise or hourly flow recorded during the flood) and the average flow calculated 
over a given period. This value is different from one flood to another. This parameter, 
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which can subsequently be introduced into the model used to simulate extreme floods, 
can be represented (and thus estimated) in two different ways. 


1. Mean shape parameter K.. This is the mean shape parameter over a collection of 
significant floods measured at the outlet of the watershed (or a neighboring one 
of a similar size with similar hydrological features). This collection of floods is 
made up of a set of peak or mean threshold-exceeding flows. If we have around n 
years of relatively continuous records, we generally select around 2n independent 
floods to work with. The nature (shape) of these floods is usually little affected 
by upstream developments and measurement problems. This choice supports the 
analysis on the upper floods occurring on the whole watershed, and allows the 
mean estimator of the shape parameter calculated using these floods to be used 
for extrapolation to extreme values. 

2. The K, — K. model. In some watersheds, a statistical link between the series of 
daily flood flows and the hydrograph’s shape parameter for this same flood can 
be obtained. If j is the day where a highest daily flood flow is observed, the shape 
parameter of the series of daily flows is defined by 


7 300) 
OG-DFOU)+OGFD 


v 


Over the set of floods chosen earlier, we can then establish the linear relationship 
K. — 1 = a(K, — 1). 


We consider that this model is representative (i.e., that there is a link between the 
daily flood flow and the shape parameter) if the coefficient a is above 0.25. If so, 
the model can then be applied to floods simulated by SCHADEX instead of using 
a constant shape parameter. Figure 18.1 shows a set of floods of the Arve river at 
Arthaz (see Sect. 18.5) and the corresponding K, — Ke model. 


18.2.3 Areal Precipitation 


Areal precipitation is the cumulative amount of rain that falls in a watershed or basin 
over a given time period (typically a day or year), assuming it is distributed uniformly 
across this whole watershed. This simplistic definition is strongly associated with 
the nature of the hydrological model used in SCHADEX, which does not include 
hydrological processes occurring at a smaller scale than a watershed. 

The value of this areal precipitation is calculated using a linear combination of 
rainfall records within or near the watershed. The weight accorded to each measure- 
ment station could be calculated, for example, using the Thiessen polygon method 
(also known as Voronoi tesselation [574]) which calculates the proportion of the 
watershed’s surface collected by each rain gauge, given its proximity to the others. 
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Fig. 18.1 Set of flood hydrographs for the Arve river at Arthaz, and the associated K, — Ke model 


18.2.3.1 Central Rainfall/Centered Precipitation Events 


Flood simulation and its probabilistic implementation is based on a simple “trian- 
gular” rainfall event, over 3 days, made up of a central rainfall (of at least 1 mm) 
and two adjacent rainfalls (one before and one after the central one) with smaller 
values than the central one. This set of three is called a centered precipitation event 
(Fig. 18.2). 
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Fig. 18.2 A centered precipitation event 


The MEWP model (18.1) can be applied to central rainfalls extracted from the 
precipitation time series of a watershed. According to this probabilistic model, work- 
ing with central rainfall amounts to working on independent threshold exceedance 
data points. 


18.2.3.2 Flow Rates After a Centered Event 


We can associate with each observed centered precipitation event the maximal 
observed flow (over a day or at the peak) a short time (0-2 days) after the central 
rainfall. In some sense, this corresponds to sampling the flow related to the rainfall 
that produced it. In watersheds where rainfall is the cause of the biggest floods, this 
distribution converges asymptotically to the distribution of annual maxima, which 
are essentially maximal flows following intense rainfall events. Figure 18.3 shows 
this distribution (and the corresponding annual maxima) for flow rates of the Tarn 
river at Millau. 

This distribution of flows is similar to the distribution of flows simulated by 
SCHADEX, and therefore provides an empirical distribution with which SCHADEX 
estimates can be compared over the range of observable return times. 


18.2.3.3 Climate Record 


This is a continuous watershed rainfall/air temperature time series at least 15 years 
long on which the stochastic rainfall/runoff simulation will be based. Provided with 
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Fig. 18.3 The flow rate distribution following centered precipitation events 


input data from the hydrological model, it allows for the generation of a sequence of 
the hydrological states found in the model (in particular: soil saturation and snow- 
packs), where all “observable” states are given with their true probabilities extracted 
from alternating dry/humid and cold/hot periods in the record. 


18.2.4 The MORDOR Hydrological Model 


MORDOR is a hydrological model, developed since the 1990s by EDF-DTG, that has 
been applied to several hundred watersheds for a range of different studies (general 
hydrology, inflow reconstruction, extreme floods, and climate change), as well as for 
operational hydrological prediction. Its main features are the following. 


e Full representation of the water cycle in mountain watersheds: evapotranspiration, 
runoff, drainage flow, storage, snowmelt, etc. 

e Reservoir model: the different hydrological “floors” of the watershed (subsurface 
layer, vegetation, slopes, and water table) are represented by interconnected reser- 
voirs (see Fig. 18.4). 

e Conceptual model: hydrological processes are given in terms of simple mathemat- 
ical terms which are not intended to give a rigorous physical characterization of 
the phenomena (like Darcy’s law or laws of hydraulics theory would do [418]). 

e While model: the watershed is considered to be a homogeneous entity. Only its 
altitude distribution is taken into account, in order to deal with snow. 


The input data for MORDOR is, in addition to hypsometry (the land elevation 
above sea level) and the area of the watershed, the mean rainfall over the latter and 
the air temperature (averaged or occasional). These data are continuous series of 
values, considered at the study’s time step. 
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Fig. 18.4 Schematic diagram of the MORDOR model (source [587]) 


In its more complete version, the model has more than 25 free parameters to 
optimize. This can be done with the help of a modified genetic algorithm which 
generates a large number of sets of parameters and selects the best ones based on 
their ability to generate typical sequences of observed flows at the watershed’s outlet. 
This ability is underlyingly defined by an objective function that combines statistical 
indices (Nash-Sutcliffe coefficient [547], Kling—Gupta efficiency [365, 440]) quan- 
tifying the bias between observed and modeled flows, simultaneously on the whole 
time series of flows, the inter-annual hydrological regimes, and the classified flow 
magnitudes. 


18.2.4.1 Stochastic Rainfall—Runoff Simulation 


This step consists in simulating the response of the watershed to simulated centered 
precipitation events of all magnitudes, starting from the whole range of hydrological 
conditions modeled with the help of the climate record. This process is summarized 
in Fig. 18.5. Several hundred simulated events are independently generated for each 
day where a centered precipitation event occurred, leading to around two million 
simulated events in all. Generated events are drawn randomly from the model for 
centered precipitation events shown in Fig. 18.2. 

The maximal daily mean flows obtained for each simulated flood are stored with 
the probability of the associated simulated precipitation event. The latter is equal to 
the product of the probability of the central rainfall P, (coming from the MEWP 
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Fig. 18.5 SCHADEX simulation flowchart 
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model (18.1) for the season containing the day of simulation) with the probabilities 
of the adjacent rainfalls (preceding and following the central rainfall): P,- and P,+. 
They are possibly supplemented by a formal description of the probabilistic link 
between the simulated precipitations and the true rainfalls observed in the previous 
days. For more details on the hypotheses involved in introducing these probabilities, 
we suggest consulting [302, 587]. 


18.2.4.2 Assessing Mean Flows and Peak Flows 


The cumulative distribution function of mean daily flows is then built using the set of 
flood flows stored for each event and their associated probabilities (after normalizing 
by the sum of all of the probabilities). This cumulative distribution function (CDF) is 
plotted using the Gumbel distribution (cf. Fig. 18.12 for the Tarn river) and compared 
with the CDF of the observed flows consecutive to a centered event. 

The distribution of peak flows can then be established in two ways: 


e using a mean shape coefficient and scaling the mean flow distribution; 
e using a variable shape coefficient (K, — Ke model), by calculating it for each 
simulated event then forming the corresponding CDF. 


18.3 Running an Analysis 


SCHADEX studies should be conducted by researchers competent in statistical 
hydrology (able to criticize data and perform usual frequency analysis) and hydro- 
logical modeling. A good hydro-climatological understanding of the watershed is 
also a plus when it comes to choosing how to model and criticizing results. A suc- 
cessful application of the method requires an in-depth hydrological analysis of the 
watershed in question and the data available. To obtain credible and robust extrapo- 
lations, all assumptions and data must be carefully examined before moving to the 
actual computational stage. The main steps involved in applying SCHADEX are the 
following. 


1. Data collection. Rainfall and air temperature measurements (in and near the 
watershed), flow rates (at its outlet or nearby—upstream and/or downstream), 
surface area and hypsometry (after a precise delineation of the watershed on a 
geographic information system or GIS), and historical flood data are brought 
together. Continuous (unbroken) series of at least 15 years of data are needed 
to robustly parameterize the probabilistic models and the hydrological model 
involved. 

2. Critical appraisal of the data. In particular, we focus on eliminating outliers, 
detecting significant absences of stationarity, and identifying break points related 
to changing a measurement device or its location. For flow rates, we focus 
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on checking the height—flow rating (calibration) curves, on identifying major 
changes of gauging sections. It is also necessary to correct for the influence of 
installations on the measured flow in order to get back to an unaltered hydrolog- 
ical signal. 

Establish the water balance. Coherently with the rest of the region, and taking 
into account specific features of the watershed in question, we can establish the 
water balance of the watershed, i.e., evaluate the mean annual precipitation that 
falls on it. In mountain watersheds, this volume of precipitations can be very 
different from the volume measured in a downstream rainfall station. 

Choosing the time step. Calculations and analyses will be made at a time step 
close to the typical duration of floods in the watershed. This duration can be 
estimated using hydrographs of significant floods that occurred in the watershed. 
In practice, we often work with a daily time step for watersheds from one hundred 
to several thousand square kilometers, especially since available data (rainfall 
and flow rate) is often no more frequent than daily. 


. Defining watershed rainfall. A watershed is modeled as a single object; we 


therefore work with a variable corresponding to overall watershed rainfall (also 
known as spatial rainfall), corresponding to the average amount of precipitation 
that falls on the watershed at each time step. This value is calculated using a 
linear combination of the observed value at each rain gauge in the watershed. 
The rainfall stations are chosen in the function of their representativity and the 
quality of their measurements. Their corresponding weights are optimized with 
the help of an initial “guess” for the hydrological model, or calculated using the 
Thiessen polygon method. 


. Probabilistic rainfall models. The following probabilistic models are fitted to 


the watershed’s rainfall time series. 


e The MEWP model (18.1) for extreme rainfall quantiles at the study’s time 
step: the year is divided into three or four appropriate consecutive periods. It is 
required to group together months in which there is homogeneity in the statisti- 
cal nature of central rainfalls, while hopefully forming periods with contrasting 
rainfall risk factors. An algorithm inserted within the fitting tool suggests a divi- 
sion into seasons which can then be modified by trial and error. The threshold 
quantile above which an exponential model is proposed for each set of months 
(and thus weather pattern) is chosen by a hydrologist in such a way that the 
model characterizes the tail of each sub-population’s distribution well. The 
threshold will be found somewhere between the median and the 90% quantile 
of each sub-population, which is segmented by type of weather and season. 

e Contingency table for adjacent rainfall events. 

e Correcting for previous rainfall events: a statistical test is run to look at the dura- 
tion of probabilistic dependence between observed centered rainfall events and 
earlier rain, and see in which months of the year the dependence is significant. 


Hydrological model parameter calibration. The hydrologist needs to choose 
the least biased and most homogeneous periods of the flow rate’s time series 
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data to calibrate the model, and in some cases put constraints on certain model 
parameters which are poorly determined by the calibration. This step generally 
requires a good level of experience in interpreting hydrological models, good 
knowledge of specific features of the watershed, and a capacity to judge the 
quality of hydrometric data. 

8. Calculating the shape coefficient. A selection of flood hydrographs is conducted 
using a specific software tool that can access the flow data. It then performs 
an over-threshold selection with independence criteria between selected hydro- 
graphs. This threshold is chosen to give on average two floods per year over 
the whole record. Floods with strong measurement anomalies or very atypical 
hydrographs are manually removed. The mean shape parameter and the model 
for the varying one (K, — Ke) are then calculated for the remaining sample. 

9. SCHADEX simulation. The rainfall-runoff stochastic simulation process pro- 
duces the distribution of extreme flows, based on the various models described 
earlier. A large number of simulated distributions are generated and compared 
with empirical distributions in order to evaluate extrapolations. For a given study 
defined with a daily time step, comparisons involve daily flow rate distribu- 
tions, mean and peak flow distributions over 3 days, then distributions defined 
by months and seasons. The hydrologist needs to identify any biases that may 
weaken the proposed extrapolations, and correct the options and parameters 
related to these biases. A range of model and option combinations should be 
tested. In general, most attention tends to be on fitting the MEWP model (18.1) 
and parameterizing the MORDOR hydrological model. 

10. Validation. Results are validated based on a number of criteria: 


e Coherence with the empirical distributions and possibly available historical 
data from in or near the watershed. 

e Coherence with regional values obtained during earlier studies. 

e The plausibility of the model parameters used (in particular in the hydrological 
model). 

e Robustness of the proposed solution with respect to the various options tested. 


The time required for a typical study is generally from a few days to a couple of 
months. It strongly depends on the quality of the data and on the watershed being 
studied. Small high-altitude watersheds are usually the most complex as they tend 
to be poorly served by measurement stations and are subject to more variable hydro- 
meteorological phenomena. 

Numerical calculations and data processing are performed with the help of a range 
of tools and dedicated software packages (FORTRAN executables and R packages) 
developed by EDF for applying the method in industrial settings. A great number of 
graphical outputs are provided to the hydrologist to facilitate their analysis of results. 
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Fig. 18.6 The Tarn river’s watershed at Millau (triangles are the rainfall stations used to calculate 
watershed rainfall) 


18.4 Case Study I: Tarn River’s Watershed at Millau 


The Tarn river’s watershed at Millau (2170 km?) is found on the southwest face of 
the Massif Central, about 85 km northwest of Montpellier (Fig. 18.6). Its median 
altitude is 889 m, and its highest point is Mont Aigoual (1699 m), close to the 
sources of the Tarn. The eastern part of the watershed is on the western face of the 
Cévennes mountain range, characterized by granitic soil and steep slopes. In the rest 
of the watershed, the Tarn and its main tributaries (the rivers Jonte and Dourbie) flow 
through gorges that cut through the Causses limestone plateaus. 

The hydrological regime of the Tarn river is Mediterranean pluvial with clear 
seasonal differences (cf. Fig. 18.7) including a pronounced baseflow during June- 
August, and significant floods due to intense rainfall during September—May, partic- 
ularly in the October—January period. 

The average annual flow calculated over 1969-2010 is 47 m?3/s (i.e., correspond- 
ing to a height of 684 mm when averaged out across the watershed’s area), which 
can be contrasted with an estimated mean annual rainfall of 1262 mm falling on the 
watershed during that period. The largest flood during this observation period was 
on November 5, 1994, with a peak flow of 2600 m°/ s following a total of 270 mm 
rainfall over the preceding days. 

The daily flow rates of the Tarn river at Millau are available from 1965, and 
hourly ones from 1988. The latter is of good quality with not many missing values. 
The watershed’s rainfall is calculated using a linear combination of daily measures 
at seven rain gauges in or near the watershed (cf. Fig. 18.6). The weights used in 
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Fig. 18.7 The hydrological regime of the Tarn river at Millau 


the linear combination have been tuned by minimizing the discrepancy between 
simulated flows from MORDOR and observed ones. 

The distributions for extreme rainfall in the watershed were computed using the 
MEWP model (18.1), divided into four seasons: December—February, March—May, 
June—September, and October-November. The MEWP distribution for the October— 
November season of higher rainfall risk and its related marginal distributions by type 
of weather are shown in Figs. 18.8 and 18.9. 

The risk of heavy rainfall events is quite characteristic of the Cevennes region, 
with more pronounced heavy rainfall in autumn; the millennial watershed rainfall for 
October-November is 309 mm/day, whereas it ranges 155-220 mm/day in the other 
seasons. During the risky season, note also that—logically for the specific climate of 
this region—the most hazardous weather pattern is of type 4 (Class 4 on graphics of 
Fig. 18.9). In terms of annual risk, the 1000-year return period for this watershed’s 
rainfall is estimated at 311 mm/day. 

The parameters of the MORDOR hydrological model have been calibrated using 
observed flow rates during 1971-2010. Due to the minimal role played by snow in this 
watershed, that part of MORDOR was not activated here, leading to a much smaller 
number of parameters to deal with. The statistical behavior of the obtained model 
is quite satisfactory. The Nash—Sutcliffe coefficient [547] between the observed and 
modeled flows is 0.89, and the coefficient determining the flow variations is 0.80. 
The hydrological regimes and distribution of the observed and modeled flows are 
shown in Figs. 18.10 and 18.11, highlighting that the hydrological model is well able 
to exhibit observed seasonal effects in the watershed’s flows and the distribution of 
its flood flows. 
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The SCHADEX simulation was performed over the climate record during 1971- 
2010. A total of 2106 events were simulated with central rainfalls located between 1 
and 490 mm: more precisely, 785 events were simulated for each day in the record 
where a central rainfall was observed. The CDF of the daily flow rate of the Tarn river 
at Millau can then be constructed from the simulated flood events, and compared with 
the observed one for events following episodes of rainfall over the period 1969-2010; 
see Fig. 18.12. It should be emphasized that the proposed CDF, even if it appears as a 
continuous line in the graph provided, does not come from an analytical formula but 
is actually constructed using the 2106 simulated floods. It is extremely coherent with 
the empirical distribution of true flow rates, which is reassuring about the robustness 
of the extrapolation proposed. The daily flow with a return time of 1000 years is 
estimated at 4795 m?/s. 

One feature of SCHADEX is that it covers almost entirely the range of extreme 
rainfall x hydrological state (of the watershed) pairs with its flood simulations at all 
times of the year. Thus, for each month we can construct a cumulative distribution 
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Fig. 18.9 Marginal distributions with respect to weather patterns for the rainfall in the watershed 
of the Tarn river at Millau (during October-November) 
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Fig. 18.10 Hydrological regime of observed and modeled flow rates of the Tarn river at Millau 


Fig. 18.11 Distribution of 
observed and modeled flows 
of the Tarn river at Millau 
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Fig. 18.12 Distributions of daily flow rates (Q24h) and peak flows (QX) estimated by SCHADEX 
for the Tarn river at Millau 
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Fig. 18.13 Monthly distributions of empirical and simulated daily flows 


function of simulated flood events and compare it with the corresponding empirical 
one (see Fig. 18.13). These monthly distributions clearly show differences in the risk 
of flooding in the Tarn river’s watershed depending on the time of year, with a clear 
risk identified during October—May, peaking strongly during October-November. 
We also remark that even at the monthly scale, there is a tight fit with the empirical 
distributions (perhaps slightly less so for October and December), which reassures 
us that seasonality in the hydrological processes is reproduced well by the set of 
models encompassed within SCHADEX (especially MEWP and MORDOR). 

The mean shape parameter of flood events was calculated using a set of 31 hydro- 
graphs recorded at Millau between 1989 and 2010 involving floods with peak flows 
of 334-2582 m?/s. These hydrographs, synchronized at the peak and mapped to 1-d, 
are shown in Fig. 18.14 along with the mean hydrograph. The mean shape parameter 
of these 31 flood events is 1.52. 

The cumulative distribution function of peak flows of the Tarn river at Millau 
can be formed using that of daily flows simply by multiplying by the mean shape 
parameter 1.52 from above. This distribution can be compared with the empirical 
one of maximal peak flows following centered rainfall events during 1989-2010 
(Fig. 18.12). We see that the SCHADEX distribution is still very coherent with respect 
to the empirical one, suggesting that the extrapolation involved can be used with 
confidence. The peak flow with a return time of 1000 years is estimated at 7289 
m/s, corresponding to a specific flow of 3.4 m?/s/km?. 

The study of extreme flows of the Tarn river at Millau is helped by the quantity and 
quality of precipitation and hydrometric data available for this particular watershed, 
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Fig. 18.14 1-d mappings of hydrographs of 31 floods of the Tarn river at Millau, synchronized at 
their peaks 


as well as the clarity of the hydrological processes at the origin of extreme flood events 
in this watershed: the risk of extreme rainfall is highly contrasted, very concentrated 
on the beginning of autumn, and a large amount of flood data is available for that 
period, helping the search for parameters of the hydrological model that lead to a 
good fit to the data. 


18.5 Case Study II: Arve River’s Watershed at Arthaz 


The watershed of the Arve river at Arthaz (1650 km?) is located in Haute-Savoie 
(Fig. 18.15). It contains in particular the Mont Blanc and Aiguilles Rouges massifs 
in its eastern part, as well as part of the Aravis mountain range. It stretches over 
a large range of altitudes, culminating at the peak of Mont Blanc (4809 m) with a 
median altitude of 1402 m, and its outlet is at an altitude of around 400 m. Glaciers 
cover about 7% of its surface, especially toward the east. Geologically, there is quite 
a difference between the eastern part—essentially crystalline, and the west, made up 
of the limestone Aravis and Bornes massifs. The main tributaries of the Arve river 
are, from upstream down, the Diosaz, Bon-Nant, Giffre, and Borne rivers. 

The Arve has a snow-glacier hydrological regime (Fig. 18.16), with a maximum 
in June entirely due to snowmelt and flows that usually remain steady over the year 
due to well spread out rainfalls. A moderate median altitude in the western part of 
the watershed (the most exposed to precipitations) is also in favor of this steady flow. 

The mean annual flow calculated over 1991-2012 (the period in which available 
hydrometric data are the most reliable) is around 65 m°/s (i.e., corresponding to a 
height of 1246 mm when averaged out over the watershed’s area), with the water- 
shed’s mean annual precipitation estimated at 1737 mm. In terms of daily flows, 
the largest one over the period of available data (1961 and after) was 535 m°/s on 
September 22, 1968. 
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Fig. 18.15 Arve river’s watershed at Arthaz (triangles are the rainfall stations used to calculate 
watershed rainfall) 
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Fig. 18.16 Snow-glacier hydrological regime of the Arve river at Arthaz 


18 SCHADEX: An Alternative to Extreme Value Statistics in Hydrology 431 


Daily flow rates for the Arve river at Arthaz are available since 1961, and hourly 
ones since 1994. This hydrometric time series is relatively complete but suffers 
from significant measurement biases linked to the upper part of the calibration curve 
(which converts water heights to flows). The watershed’s rainfall can be calculated 
over the period 1966-2013 using a linear combination of the daily measures taken 
by the four hydrometric (or rainfall) stations (see Fig. 18.15). As for the Tarn river 
example, the four weights are optimized with help from the hydrological model. 

Modeling extreme rainfall in the watershed using MEWP distinguishes four differ- 
ent seasons: January—March, April-May, June-August, and September-December. 
The distributions for the latter and April—May, as well as the corresponding data, 
are shown in Fig. 18.17. The risk of severe rainfall is maximal during September— 
December, with a value of 157 mm for a 1000-year return time. The value ranges 
96-127 mm in the other periods. Seasonal differences are therefore less pronounced 
than those in the Tarn example. In the September-December and January—March 
periods, type 2 weather patterns (known as stationary fronts) bring the highest risk 
of severe rainfall. In terms of annual risk, the 1000-year return period is estimated at 
159 mm/day. 

The MORDOR parameters were calibrated using the observed flows over the 
years 1991-2012, because the hydrometric time series seemed to have the highest 
quality for this period. Obviously, snow (and ice) accumulation and melting processes 
were taken into account in this model. The statistical performance of the model is 
acceptable: the Nash—Sutcliffe coefficient [547] between the observed and modeled 
flows was 0.85, similar to the coefficient determining flow variations. The regimes and 
distributions of the observed and modeled flows are shown in Figs. 18.18 and 18.19 
and they highlight that the hydrological model is capable of reproducing seasonal 
flows and the distribution of the most extreme ones (due to high rainfall and the peak 
period for snowmelt). The largest values appear to be slightly underestimated by the 
model here, but this parameterization of the model is preferred over others based on 
a range of criteria (plausibility of parameters, robustness of the extrapolation, etc.). 

The SCHADEX simulation was then run on the climate record for the years 1962- 
2012. Again, 2016 events were simulated with central rainfalls taking values between 
1 and 240 mm. 

The CDF of the daily flow rate of the Arve river at Arthaz was then constructed 
from the simulated flood events, and compared with the observed one for events 
following episodes of rainfall over the period 1961-2013; see Fig. 18.20. At this 
level of the study, all of the available hydrometric information was used (while only 
the period 1991-2012 was used for model calibration) in order to better evaluate the 
robustness of the proposed extrapolations. 

The results are coherent with the empirical distribution of observed flows, though 
the extrapolation moves away from it for 10-year return periods and longer. To clar- 
ify this, we can take a closer look at the simulated seasonal distributions and the 
corresponding recorded data (Fig. 18.21). We see that the hydrological risk does not 
appear to vary much across seasons up to a 20-year return time; however, extrapo- 
lating to higher return times does confirm that the September—December period is 
the high-risk one. 
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Fig. 18.17 MEWP 
distributions of rainfall in the 
Arve river’s watershed at 
Arthaz (April-May and 
September-December) 
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Fig. 18.18 Hydrological regime of the Arve river at Arthaz 
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Fig. 18.20 Distributions of daily flow rates (Q24h) and peak flows (QX) estimated by SCHADEX 
for the Arve river at Arthaz 
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Fig. 18.21 Distributions of daily flow rates for each season estimated by SCHADEX for the Arve 
river at Arthaz 
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The concordance with the empirical seasonal distributions is good, in particular for 
the at-risk season. For January—March, four events have empirical return times which 
deviate from the distribution simulated by SCHADEX. The daily flow corresponding 
to a 1000-year return time is estimated at 1194 m°/s. 

In this study, the calculation of peak flows is performed using the variable shape 
parameter (K, — K.) model based on a set of 35 hydrographs recorded at Arthaz 
between 1994 and 2014, with peak flows in the range 272-640 m?/s. This model is 
written as 

K,-—1=0.61(K, — 1). 


The ratio of the peak flow over the mean daily flow is thus calculated for each sim- 
ulated flood event, and depends on the dynamics of daily flows. The CDF of peak 
flows of the Arve river at Arthaz has thus been produced by simulation and subse- 
quently compared to the empirical distribution of maximal peak flows consecutive 
to centered rainfall events over the period 1994-2013 (Fig. 18.20). The coherence 
between the SCHADEX distribution and the empirical one is acceptable. The peak 
flow with a 1000-year return time is estimated at 1888 m°/s (i.e., a specific flow of 
1.14 m°/s/km?). The ratio of the peak flow over the daily flow quantile increases as 
the return time T does; when T = 10 years, it is equal to 1.42, rising to 1.58 for 
T = 1000 years. 

An estimation of the risk of extreme floods of the Arve river is made complicated 
by several things: 


varying quality in the bhydrometric time series, depending on time and the mag- 
nitude of observed values; 

a risk of extreme rainfall that does not change much during the year; 

high recorded floods, caused simultaneously by strong rainfall and snowmelt in 
varying proportions from one occurrence to the next; treating such data is a real 
challenge for the hydrological model and for selecting its parameters robustly. 


Nevertheless, SCHADEX does allow us to provide a detailed view of hydrological 
risks by attempting to characterize seasonal effects in the statistical and hydrological 
processes involved as well as possible. This ability to bring together a large quantity of 
hydro-climatological information for making robust extrapolations is without doubt 
one of the most important features of the method. 


Appendix A 
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Accretion. Accretion refers to the formation and growth of a body, structure or 
object by the addition and/or agglomeration of matter. 

Albedo. Albedo is the reflective power of a surface, defined as the ratio of the 
reflective light energy to the incident light energy. 

Anchor variable. An anchor variable is a variable about which expert knowledge 
can be added using a decision heuristic. The nature and order of magnitude of this 
variable is known to the expert, who can use its properties to state: “the variable X 
takes values between x, and x”. An observable variable, i.e., one that is directly 
accessible to the expert, often has good anchoring properties. 

ANEMOC. | The Atlas Numérique d’ Etats de mer Océanique et Côtier (ANEMOC: 
http://anemoc.cetmef.developpement-durable.gouv.fr) is a database for oceanic 
data (climate, bathymetry, wave energy, etc.) developed by EDF-LNHE with the 
help of CETMEF, built using retrospective simulation (with the TOMAWAC soft- 
ware) over a period of around 24 years (1979-2002) for the Atlantic, Channel, 
and North Sea coasts, and 30 years (1979-2008) for the Mediterranean Sea. 

Asymptotic behavior. The asymptotic behavior of a mathematic object © (t) 
depending on a value f means a description of what happens to this object as 
t tends to a certain limit. If t = n, where n the number of observed values and 
u(t) = È, a statistical estimator of the quantity >, then its asymptotic proper- 
ties characterize the nature of its convergence or divergence to a value ZX, when 
n — oo; the function n > Ê, tends to a (straight line) asymptote n +> Xo. 

ASN. The French Autorité de Sûreté Nucléaire (ASN) is the independent admin- 
istrative authority which has the role, on behalf of the state, to overlook civil 
nuclear safety and radiation protection, and to disseminate information on risks 
associated with nuclear activity. 

Baseflow. The baseflow is the minimal annual flow of a watercourse, correspond- 
ing to a dry period due to drought or water extraction (e.g., for agricultural use). 

CFBR. The Comité Français des Barrages et Réservoirs (CFBR; http://www. 
barrages-cfbr.eu) is an association composed of members representing the admin- 
istration, national societies, public establishments, local authorities, businesses, 
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and engineering companies, which is coordinated by EDF and whose role it is, 
through various internal working groups, to establish technical criteria and rec- 
ommendations for the design, implementation, monitoring (etc.) of dams and 
tanks/reservoirs (e.g., retention basins) in France. 

Chop. Waves generated by local wind along an expanse of water (sea, estuary, 
lake, etc.). 

Cluster. A cluster is a group of consecutive and correlated values from the same 
process, sharing a certain feature. For example, a cluster of exceedances can be 
seen as a wave of values above a given threshold. 

CTPBOH. The Comité Technique Permanent des Barrages et Ouvrages 
Hydrauliques (CTPBOH) is a consultative body of the French Ministry of Envi- 
roment, created in 1996 and composed of experts chosen according to their par- 
ticular technical skills in the field of hydraulic structures. This body is consulted 
in advance about all construction or modification projects related to large dams 
and dikes. 

Decision heuristic. A decision heuristic is a cognitive shortcut used by an indi- 
vidual to make a quick judgement (or decision) about a phenomenon known at 
least partially to them, based on their knowledge of the situation. Untangling the 
nature of the heuristic used (for instance, whether it is universally valid or not, 
or linked to personal experience, etc.) can help to protect against potential biases 
contained within it. See [253, 335] for more details and examples. 

Declustering. A cluster is, in the context of extreme value statistics, a sequence 
of consecutive exceedances of a threshold that can be considered as part of the 
same wave. Declustering is a procedure to replace such dependent data with a 
single data point (for example, the maximum exceedance) in order to produce an 
independent sample. 

Entropy. Theentropy H (X) ofarandom variable X with density f (x) is a measure 
of the information quantity contained in the distribution of X. In the discrete case, 
Shannon’s entropy formula defines it as the minimum number of bits required to 
reconstruct a message coded in the data. The concept can be extended to the con- 
tinuous case with the formula H(X) = —E [log f (X)]. Every integrable density 
has finite entropy, and the greater the entropy, the less informative the distribution 
f is considered to be. This concept is one of the most important in Shannon’s 
(probabilistic) information theory, and also corresponds to the notion of entropy 
in the physics (thermodynamics) sense. See [185] for more details. 

Eustastic change.  Eustastic change is a measure of variation in the mean sea 
level, with respect to the supposed fixed continents. This cyclic phenomenon of 
raising/lowering of the sea level, in the order of mm/year, has several causes: 
melting glaciers, changes in the tectonic plates, rate of activity in oceanic ridges, 
etc. 

Extreme tide. An extreme high tide is an abnormally high or low tide, induced 
by unusual weather conditions on top of the usual tides induced by the moon and 
sun. 

Flood plain. A watercourse’s flood plain is the zone over which water flows if it 
breaks its usual banks; this can be a vast area if flooding occurs 
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Gauging. In hydrology, gauging is the process of measuring water flow at a gaug- 
ing station at a given time. This can be performed using kinematics, dynamics 
(hydraulics), or physics (using dilution). 

Heteroskedasticity. | Heteroskedasticity describes the nature of a collection of ran- 
dom variables if some subcollections of these variables have different values of 
variability (e.g., variances). 

High tide exceedance. A high tideexceedance is a positive high tide residual, or 
surge. 

Hydro Bank. Hydro Bank (www.hydro.eaufrance.fr) is a department of the French 
Ministry of Ecology, Sustainable Development and Energy, storing the water 
level values recorded at stations in service on French rivers by the State services 
(Regional Environmental Directorates, of Development and Housing (DREAL), 
Territorial Departmental Directorates (DDT), water agencies, etc.), companies 
such as EDF, development companies such as the Compagnie Nationale du Rhône 
and various research organizations (IRSTEA, universities...). 

Hydrograph. A hydrograph is a plot of the temporal variation in water flow, 
either at a specific place in a watershed (e.g., the outlet) or at some point along a 
watercourse. A flood, shown on a hydrograph, is characterized by a rising limb, 
a peak, following by a recession limb. 

Industrial risk. Industrial risk is generally defined as the product of a probability 
of occurrence of a hazard by a measure of the cost incurred by it. Note that accord- 
ing to the ISO 73:2009 standard [7], a less formal and more general definition of 
risk is the effect of uncertainty on the achievement of objectives. 

Kurtosis. The kurtosis or Pearson flattening coefficient of a probability distribu- 
tion of a real random variable is defined as the third order moment of this reduced 
centered variable. 

Non-inductive logic.  Non-inductive logic is a type of logic that is not based on 
the observation of specific events in order to deduce the general behavior of a 
phenomenon. 

Outlet (of a watershed or bassin). The outlet of a watershed or bassin is the point 
of convergence of all waters belonging to it. It can be a lake, watercourse, sea, 
ocean, etc. 

Outlier. An outlier is a data point whose value is significantly different to the other 
ones in a statistical sample [619]. 

Peak flow. Peak flow is the maximum instantaneous flow recorded during a high 
water event. 

Phenology. Phenology is the study of the appearance of periodic (generally annual) 
events that depend on seasonal variation in the climate, usually in terms of living 
things (trees, vines, etc.). It can help us to distinguish markers of catastrophic 
natural events from the past, and to reconstruct historical extreme values. 

Possibility theory. Possibility theory is a mathematical theory, alternative to prob- 
ability theory, for dealing with certain types of uncertainty. 

Reinsurance. Reinsurance means insuring insurers themselves. It works by trans- 
ferring to a company (the reinsurer) a random risk (typically, the consequences 
of a extreme hazard event) in return for payment corresponding to the transferred 
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risk and its characteristics. This mechanism originates in the industrial revolution 
in Germany, where insurance companies were unable to cover the totality of risk 
potentially affecting rapidly developing industry. 

Risk aversion. Risk aversion is a behavior that pushes an arbitrator (real or vir- 
tual, such as a questioned expert) to avoid that the (possible) consequences of their 
arbitration could be turned against him or her. This leads to bias corresponding 
to excessive caution, or even a refusal to provide expertise. Thus, an expert ques- 
tioned about the imminence of an extreme hazard may tend to over-estimate its 
likelihood, because a posteriori, an under-estimation could be much more costly 
to them professionally. 

Risk neutrality. A risk neutral arbitrator is one whose behavior is not affected by 
personal biases. 

River bed. The river bed of a stream, river or creek, is the physical space in which 
water flows normally, held in by the watercourse’s banks. 

SHOM. The Service Hydrographique et Océanographique de la Marine (SHOM) 
is a French public administrative body, under the supervision of the Ministry 
of Defense, whose mission is to understand and describe the oceanic physical 
environment in its relationship with the atmosphere, seabed and littoral zones, 
to predict the evolution of these interactions, and to disseminate this information 
(www.shom.fr). Its flagship activity is to build reference databases characterizing 
the geophysical, maritime, and coastal environments. 

Skewness. The skewness or asymmetry of a probability distribution of a real ran- 
dom variable is defined as the third order moment of this reduced centered variable. 

Spillway. A spillway is a structure attached to dams which helps to control the 
maximum water level behind them. It does so by diverting water down itself if 
more water arrives at the dam than can be safely stored behind it, thus protecting 
the structure. 

Swell. Swell is arelatively regular wave movement of the sea surface. Unlike chop, 
swell is generated by the wind blowing away from the coast. It is more regular 
than chop in terms of period and height. 

Tidal range. The tidal range is the difference in level between high and low tide 
of a tide. Specifically, it is generally defined, for a given day and within a high 
water—low water interval, as the difference in water level between the high water 
level and the low water level immediately following or preceding it. It is sometimes 
referred to as tide height (a term sometimes also used for height of water) or tide 
amplitude (referring to half tide, i.e., the difference in height of water at high or 
low water with that at mid-tide). 

Watershed. A watershed is a territory, surrounded by natural boundaries, whose 
waters all converge toward the same exit point—called an outlet. 

Water status. Water status refers to the quantity of water retained in soil (quali- 
tatively: from very dry to very wet soil). 
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Tools Used by EDF 


The calculations of most of the examples reported in this book have been carried out 
using the packages available in free access on the Internet and dedicated to extreme 
statistics. See http://www.r-project.org for more information on the environment R. 

Other software tools are used at EDF. The ASTEX software has been developed 
by the LNHE to merge, restructure, and improve three existing tools for statistical 
adjustment of extreme laws to time series of flows (CRUE), swells (EVENAL) and 
excess tides (SWELL). Today ASTEX is used in the following laboratories (see [40, 
57] for case studies): 


the National Laboratory of Hydraulics and Environment (LNHE), belonging to 
EDF Research & Development; 

the General Technical Division (DTG), an entity of EDF, in particular in charge 
of the measurement and expertise concerning the hydrometric and meteorological 
forecasts necessary for the proper functioning of EDF’s hydroelectric and nuclear 
fleet; 

le Centre for Studies and Expertise on Risks, the Environment, Mobility and Devel- 
opment (CEREMA): https://www.cerema.fr/. 


We can also mention the EXTREMES software tool, running in a MATLAB! 
environment and which gathers different tools dedicated to the study of extreme 
values. It is the result of a collaboration between the IS2 team of INRIA Rhône- 
Alpes and EDF R&D. A toolbox running in a SCILAB environment is currently 
being developed at the LNHE. 


Useful R Packages 


List of useful R packages used for the treatment of case studies: 


e evd (functions for Extreme Value Distributions) 
e evir (Extreme Values in R) 


' http://mistis.inrialpes.fr/software/EXTREMES/accueil.html 
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extRemes (Extreme value toolkit): its graphical interface allows it to be of simple 
use; various methods used in [173] 

ismev (An Introduction to Statistical Modeling of Extreme Values) 

Renext (Renewal method for extreme values extrapolation): developed by IRSN 
with user’s graphical interface on R (RenextGUT) ; 

POT (Peaks Over Threshold) 

RFA (Regional Frequency Analysis) 

nsRFA (Non-supervised Regional Frequency Analysis) 

chron (Chronological objects which can handle dates and times) 

Kendal11 (Kendall rank correlation and Mann-Kendall trend test) 

texmex (Statistical Modelling of Extreme Values) 


It should be noted that some examples of studies under R, using the graphical 
interface of the extRemes package on an example of wind data, are detailed in 
depth in [576]. 


Other Software Tools 


The environment S-plus, competitor of R, also presents a chargeable set of libraries 
for the use of extreme statistics (rather in finance): The EVIS library, The 
Stuart Cole’s Routineand The S+Finmetrics Module. 


Software tools comparable to EXTREMES are also available from RISKTEC 
(Xtremes: http://www.risktec.de/software.htm) and WRPLLC (HYFRAN: http:// 
www.wrpllc.com). They present a wide range of tools and methods to evaluate 
extremes, as well as an easy-to-use graphical interface. 
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